Menu Top
Additional Questions for each Class with Solution
6th 7th 8th 9th 10th 11th 12th

Class 9th Chapters
1. Number Systems 2. Polynomials 3. Coordinate Geometry
4. Linear Equations In Two Variables 5. Introduction To Euclid’s Geometry 6. Lines And Angles
7. Triangles 8. Quadrilaterals 9. Areas Of Parallelograms And Triangles
10. Circles 11. Constructions 12. Heron’s Formula
13. Surface Areas And Volumes 14. Statistics 15. Probability

Content On This Page
Objective Type Questions Short Answer Type Questions Long Answer Type Questions


Chapter 14 Statistics (Additional Questions)

Welcome to this dedicated supplementary practice section designed to thoroughly reinforce and extend your understanding of the fundamental concepts of Statistics introduced in Class 9. This crucial branch of mathematics provides the tools to collect, organize, analyze, interpret, and present data, enabling us to make sense of information and draw meaningful conclusions in a world awash with data. While the main chapter laid the groundwork by introducing methods for data presentation, graphical representation, and basic measures of central tendency, this collection of additional questions aims to build your confidence and proficiency through more extensive practice, larger datasets, and nuanced interpretation challenges.

Recall that your core studies focused on several key areas. You learned about the initial steps of data handling, including organizing raw data into meaningful structures like grouped frequency distribution tables. A significant emphasis was placed on visually representing this data through various graphical methods. This included revisiting histograms (bar graphs for continuous data) and introducing important tools like:

This supplementary section provides vital practice, particularly focusing on areas that require careful handling and deeper understanding. Expect to work with more extensive raw datasets, demanding meticulous organization into appropriate grouped frequency distributions, including making informed decisions about class intervals. You will gain significant practice in constructing histograms, with a special focus on those challenging cases involving unequal class intervals. Here, simply using frequency as height is misleading; you must calculate adjusted frequencies or frequency densities (typically, $\text{Frequency Density} = \frac{\text{Frequency}}{\text{Class Width}}$) to ensure the area of each bar correctly represents the frequency, maintaining the visual integrity of the distribution.

Furthermore, constructing frequency polygons, both superimposed on histograms and independently, will be thoroughly reinforced. The calculation of mean, median, and mode for ungrouped data receives extensive attention, potentially involving larger datasets or data containing fractions or decimals, honing your computational accuracy. While complex calculations for grouped data central tendency are usually reserved for later study, practice identifying the modal class (class with highest frequency) or the median class (class containing the median value) will solidify your understanding of how these concepts apply to grouped distributions. Crucially, the emphasis extends beyond mere construction and calculation to interpretation. Questions will push you to analyze histograms and frequency polygons more deeply, perhaps comparing different distributions shown on the same graph or identifying specific trends and patterns within the data.

Engaging rigorously with these additional exercises is essential for becoming truly proficient in organizing raw data, mastering the accurate construction and insightful interpretation of histograms (including variable width cases) and frequency polygons, and solidifying your grasp of the basic measures of central tendency. These skills form the bedrock of statistical literacy, empowering you to understand and critically evaluate data presented in various forms.



Objective Type Questions

Question 1. In statistics, 'data' refers to:

(A) Raw facts and figures.

(B) Summarized information.

(C) Conclusions drawn from information.

(D) Graphical representations.

Answer:


In statistics, 'data' refers to raw facts and figures that are collected for analysis.

Summarized information, conclusions, and graphical representations are typically derived from the raw data after processing and analysis.

Therefore, the correct option is (A).

(A) Raw facts and figures.


Question 2. The number of times a particular observation occurs in a given data is called its:

(A) Range

(B) Class Mark

(C) Frequency

(D) Cumulative Frequency

Answer:


In statistics, the number of times a particular observation appears in a dataset is defined as its frequency.

The Range is the difference between the highest and lowest values in a dataset.

The Class Mark is the midpoint of a class interval.

Cumulative Frequency is the running total of frequencies up to a certain point.

Thus, the correct answer is (C).

(C) Frequency


Question 3. Consider the following data on marks obtained by 10 students in a test: 10, 12, 15, 10, 18, 15, 12, 20, 10, 15.

The frequency of the mark 10 is:

(A) 1

(B) 2

(C) 3

(D) 4

Answer:


To find the frequency of the mark 10, we need to count how many times the number 10 appears in the given data set.

The given data set is: 10, 12, 15, 10, 18, 15, 12, 20, 10, 15.

Let's list the occurrences of 10:

First occurrence: 10

Second occurrence: 10

Third occurrence: 10

The number 10 appears 3 times in the data.

Therefore, the frequency of the mark 10 is 3.

The correct option is (C).

(C) 3


Question 4. In a frequency distribution, the difference between the upper limit and the lower limit of a class interval is called the:

(A) Class Mark

(B) Class Size

(C) Frequency

(D) Range

Answer:


In a frequency distribution, the difference between the upper limit and the lower limit of a class interval is known as the Class Size or Class Width.

Let $L$ be the lower limit and $U$ be the upper limit of a class interval.

Class Size $= U - L$

Let's examine the other options:

Class Mark is the midpoint of a class interval, calculated as $(U + L)/2$.

Frequency is the count of observations falling within a specific class interval.

Range is the difference between the maximum and minimum values in the entire dataset, not a specific class interval.

Therefore, the correct option is (B).

(B) Class Size


Question 5. The class mark of the class interval $20 - 30$ is:

(A) 20

(B) 30

(C) 25

(D) 10

Answer:


The class mark of a class interval is the midpoint of the interval.

It is calculated by adding the lower limit and the upper limit of the class interval and then dividing the sum by 2.

The formula for the class mark (CM) is:

$CM = \frac{\text{Lower Limit} + \text{Upper Limit}}{2}$

For the class interval $20 - 30$, the lower limit is 20 and the upper limit is 30.

Substituting these values into the formula:

$CM = \frac{20 + 30}{2}$

$CM = \frac{50}{2}$

$CM = 25$

Thus, the class mark of the class interval $20 - 30$ is 25.

Comparing this result with the given options, we find that 25 corresponds to option (C).

(C) 25


Question 6. A histogram is a graphical representation of a grouped frequency distribution in the form of rectangles with class intervals as bases and $\dots$ as heights.

(A) Class marks

(B) Frequencies

(C) Cumulative frequencies

(D) Class boundaries

Answer:


A histogram is a type of graph used to represent the distribution of numerical data. It is particularly useful for visualizing grouped frequency distributions.

In a histogram, the data is divided into class intervals, which are represented on the horizontal axis (the bases of the rectangles).

The vertical axis typically represents the frequency of the data falling within each class interval.

The height of each rectangle in a histogram corresponds to the number of observations (frequency) in the respective class interval, assuming the class intervals have equal width. If the class intervals have unequal widths, the height represents the frequency density, which is the frequency divided by the class width, ensuring the area of the rectangle is proportional to the frequency.

Looking at the options:

(A) Class marks are the midpoints of the class intervals, not the heights.

(B) Frequencies represent the count of data points within each class interval, and these counts determine the heights of the rectangles.

(C) Cumulative frequencies are running totals of frequencies and are represented in graphs like ogives, not histograms.

(D) Class boundaries define the limits of the class intervals on the horizontal axis, not the heights.

Therefore, the heights of the rectangles in a histogram represent the frequencies (or frequency densities).

The correct option is (B).

(B) Frequencies


Question 7. Which graphical representation uses bars to show the frequency of observations, where the width of the bars is arbitrary and the bars are separated by uniform gaps?

(A) Histogram

(B) Frequency Polygon

(C) Bar Graph

(D) Ogive

Answer:


Let's analyze the characteristics of each graphical representation mentioned:

A Histogram uses adjacent rectangles over class intervals where the area of each rectangle is proportional to the frequency of the observations in that interval. The rectangles are not separated by gaps (unless a class has zero frequency), and the width of the rectangles corresponds to the class width.

A Frequency Polygon is a line graph that connects the midpoints of the tops of the bars of a histogram or plots frequency against class marks.

A Bar Graph uses bars to represent the frequency or value of discrete categories or observations. The bars are typically separated by uniform gaps, and their width is usually uniform but arbitrary in the sense that it doesn't represent a continuous range like in a histogram.

An Ogive (or Cumulative Frequency Polygon) is a graph that displays the cumulative frequency of data by plotting the cumulative frequency against the upper class boundaries.

The description in the question, specifically "bars are separated by uniform gaps" and "the width of the bars is arbitrary," matches the characteristics of a Bar Graph.

Therefore, the correct option is (C).

(C) Bar Graph


Question 8. The midpoint of a class interval $a - b$ (inclusive) is given by:

(A) $b-a$

(B) $\frac{a+b}{2}$

(C) $\frac{a+b}{2} + \text{adjustment}$ (for exclusive classes)

(D) $\frac{b-a}{2}$

Answer:


The midpoint of a class interval, also known as the class mark, represents the central value of the interval.

It is calculated as the average of the lower limit and the upper limit of the class interval.

Let the lower limit of the class interval be $a$ and the upper limit be $b$.

The formula for the midpoint is:

$\text{Midpoint} = \frac{\text{Lower Limit} + \text{Upper Limit}}{2}$

Substituting $a$ for the lower limit and $b$ for the upper limit, we get:

$\text{Midpoint} = \frac{a + b}{2}$

Let's look at the given options:

(A) $b-a$ is the class size or width of the interval.

(B) $\frac{a+b}{2}$ is the average of the lower and upper limits, which is the definition of the midpoint.

(C) $\frac{a+b}{2} + \text{adjustment}$ is not the standard formula for the midpoint, although how class boundaries are defined (inclusive vs. exclusive) affects which values fall into an interval, the midpoint calculation itself is based on the average of the stated limits.

(D) $\frac{b-a}{2}$ is half of the class size.

Therefore, the correct option is (B).

(B) $\frac{a+b}{2}$


Question 9. What is the range of the following data set: 25, 18, 20, 22, 16, 23, 17, 21, 19, 24?

(A) 10

(B) 9

(C) 8

(D) 25

Answer:


The range of a data set is the difference between the highest (maximum) and the lowest (minimum) values in the set.

The given data set is: 25, 18, 20, 22, 16, 23, 17, 21, 19, 24.

To find the range, we first identify the maximum and minimum values in this data set.

The maximum value in the data set is 25.

The minimum value in the data set is 16.

Now, we calculate the range using the formula:

Range = Maximum Value - Minimum Value

Range = $25 - 16$

Range = $9$

Therefore, the range of the given data set is 9.

Comparing this result with the given options, we find that 9 corresponds to option (B).

(B) 9


Question 10. Measures of central tendency give an idea of the $\dots$ of the data.

(A) Spread

(B) Maximum value

(C) Typical value

(D) Minimum value

Answer:


Measures of central tendency are statistical values that describe the center or a typical value of a data set.

Common measures of central tendency include the mean, median, and mode.

These measures aim to represent the whole set of data by a single value that lies within the range of the data.

Let's consider the options:

(A) Spread refers to how dispersed the data is (e.g., range, variance, standard deviation).

(B) Maximum value is the highest value in the data set.

(C) Typical value (or central value) is what measures of central tendency represent.

(D) Minimum value is the lowest value in the data set.

Therefore, measures of central tendency give an idea of the typical value of the data.

The correct option is (C).

(C) Typical value


Question 11. The arithmetic mean of the observations 5, 8, 10, 12, 15 is:

(A) 10

(B) 11

(C) 12

(D) 50

Answer:


The arithmetic mean (or average) of a set of observations is calculated by summing all the observations and dividing by the total number of observations.

Let the observations be $x_1, x_2, \dots, x_n$. The arithmetic mean, denoted by $\bar{x}$, is given by the formula:

$\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}$

In this case, the observations are 5, 8, 10, 12, and 15.

The number of observations is $n = 5$.

The sum of the observations is:

Sum $= 5 + 8 + 10 + 12 + 15$

Sum $= 50$

Now, we calculate the arithmetic mean:

$\bar{x} = \frac{50}{5}$

$\bar{x} = 10$

Thus, the arithmetic mean of the given observations is 10.

Comparing this result with the given options, we find that 10 corresponds to option (A).

(A) 10


Question 12. The mode of a set of observations is the value that:

(A) Is in the middle when the data is sorted.

(B) Occurs most frequently.

(C) Is the sum of all observations divided by the number of observations.

(D) Is the difference between the highest and lowest observations.

Answer:


The mode is a measure of central tendency in statistics.

Let's examine the definitions provided in the options:

(A) The value that is in the middle when the data is sorted is called the median.

(B) The value that occurs most frequently in a data set is the definition of the mode.

(C) The sum of all observations divided by the number of observations is the definition of the arithmetic mean.

(D) The difference between the highest and lowest observations is the definition of the range.

Based on these definitions, the mode is the value that occurs most frequently.

Therefore, the correct option is (B).

(B) Occurs most frequently.


Question 13. Find the median of the following data: 7, 4, 9, 5, 10, 8, 6.

(A) 7

(B) 8

(C) 6

(D) 9

Answer:


To find the median of a data set, we must first arrange the observations in ascending or descending order.

The given data set is: 7, 4, 9, 5, 10, 8, 6.

Let's arrange the data in ascending order:

4, 5, 6, 7, 8, 9, 10

Next, we count the number of observations in the data set. There are 7 observations.

The number of observations, $n$, is 7, which is an odd number.

When the number of observations is odd, the median is the value located at the $\left(\frac{n+1}{2}\right)$-th position in the sorted data.

Position of the median $= \left(\frac{7+1}{2}\right)$-th position

Position of the median $= \left(\frac{8}{2}\right)$-th position

Position of the median $= 4$-th position

Now we look at the 4th value in the sorted data:

4, 5, 6, 7, 8, 9, 10

The value at the 4th position is 7.

Therefore, the median of the given data is 7.

Comparing this result with the given options, we find that 7 corresponds to option (A).

(A) 7


Question 14. Which measure of central tendency is most affected by extreme values (outliers)?

(A) Mean

(B) Median

(C) Mode

(D) All are equally affected.

Answer:


Let's analyze how each measure of central tendency is affected by extreme values (outliers).

The Mean is calculated by summing all the observations and dividing by the number of observations. The formula is $\bar{x} = \frac{\sum x_i}{n}$. Since the mean is calculated using every value in the data set, a single very large or very small value (an outlier) can significantly pull the mean towards it, shifting it away from the center of the majority of the data.

The Median is the middle value of a data set when it is arranged in order. If there are an odd number of observations, it's the single middle value. If there are an even number, it's the average of the two middle values. Outliers (extreme values) do not affect the median as much as the mean because they only change the position of the middle value(s) slightly, if at all, rather than directly contributing to the sum.

The Mode is the value that occurs most frequently in the data set. An outlier is typically a value that is far removed from the other values and is usually not repeated often. Therefore, an outlier is unlikely to become the mode unless it is the only value that appears more than once in a small dataset.

Comparing the effects, the mean is directly influenced by the magnitude of each value, including extreme ones. The median and mode are positional or frequency-based measures and are much less sensitive to the exact values of outliers.

Therefore, the measure of central tendency most affected by extreme values is the mean.

The correct option is (A).

(A) Mean


Question 15. Match the statistical measure in Column A with its description in Column B:

(i) Mean

(ii) Median

(iii) Mode

(iv) Frequency

(a) Middle value in sorted data

(b) Average value

(c) Number of times an observation occurs

(d) Most frequent value

(A) (i)-(a), (ii)-(b), (iii)-(d), (iv)-(c)

(B) (i)-(b), (ii)-(a), (iii)-(d), (iv)-(c)

(C) (i)-(b), (ii)-(d), (iii)-(a), (iv)-(c)

(D) (i)-(a), (ii)-(d), (iii)-(b), (iv)-(c)

Answer:


Let's match each statistical measure from Column A with its correct description from Column B.

(i) Mean: The mean is the arithmetic average of a set of observations. This matches description (b) Average value.

(ii) Median: The median is the middle value in a data set that has been sorted in ascending or descending order. This matches description (a) Middle value in sorted data.

(iii) Mode: The mode is the value that appears most frequently in a data set. This matches description (d) Most frequent value.

(iv) Frequency: Frequency refers to the number of times a particular observation occurs in a data set. This matches description (c) Number of times an observation occurs.

Based on these definitions, the correct matching is as follows:

Column A Column B (Description) Match
(i) Mean(b) Average value(i) - (b)
(ii) Median(a) Middle value in sorted data(ii) - (a)
(iii) Mode(d) Most frequent value(iii) - (d)
(iv) Frequency(c) Number of times an observation occurs(iv) - (c)

Comparing the matching with the given options:

(A) (i)-(a), (ii)-(b), (iii)-(d), (iv)-(c) - Incorrect

(B) (i)-(b), (ii)-(a), (iii)-(d), (iv)-(c) - Correct

(C) (i)-(b), (ii)-(d), (iii)-(a), (iv)-(c) - Incorrect

(D) (i)-(a), (ii)-(d), (iii)-(b), (iv)-(c) - Incorrect

Therefore, the correct option is (B).

(B) (i)-(b), (ii)-(a), (iii)-(d), (iv)-(c)


Question 16. Assertion (A): The mean is always one of the observations in the data set.

Reason (R): The mean is calculated by dividing the sum of observations by the number of observations.

(A) Both A and R are true and R is the correct explanation of A.

(B) Both A and R are true but R is not the correct explanation of A.

(C) A is true but R is false.

(D) A is false but R is true.

Answer:


Let's analyze the Assertion (A) and the Reason (R) separately.

Assertion (A): The mean is always one of the observations in the data set.

Consider the data set $\{1, 2, 3\}$.

The sum is $1 + 2 + 3 = 6$. The number of observations is 3.

The mean is $\frac{6}{3} = 2$. Here, the mean (2) is one of the observations.

Consider the data set $\{1, 2, 4\}$.

The sum is $1 + 2 + 4 = 7$. The number of observations is 3.

The mean is $\frac{7}{3} \approx 2.33$. Here, the mean ($\approx 2.33$) is not one of the observations (1, 2, 4).

Therefore, Assertion (A) is false.

Reason (R): The mean is calculated by dividing the sum of observations by the number of observations.

The formula for the arithmetic mean ($\bar{x}$) of $n$ observations $x_1, x_2, \dots, x_n$ is $\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}$.

This statement accurately describes how the arithmetic mean is calculated.

Therefore, Reason (R) is true.

Now, let's look at the options based on our evaluation:

(A) Both A and R are true and R is the correct explanation of A. (Incorrect, A is false)

(B) Both A and R are true but R is not the correct explanation of A. (Incorrect, A is false)

(C) A is true but R is false. (Incorrect, A is false and R is true)

(D) A is false but R is true. (Correct, A is false and R is true)

The correct option is (D).

(D) A is false but R is true.


Question 17. Assertion (A): A frequency polygon is drawn by plotting points corresponding to the class marks and frequencies.

Reason (R): The class marks represent the midpoints of the class intervals.

(A) Both A and R are true and R is the correct explanation of A.

(B) Both A and R are true but R is not the correct explanation of A.

(C) A is true but R is false.

(D) A is false but R is true.

Answer:


Let's evaluate the Assertion (A) and the Reason (R).

Assertion (A): A frequency polygon is drawn by plotting points corresponding to the class marks and frequencies.

A frequency polygon is indeed constructed by plotting points on a graph. The horizontal axis represents the values of the variable, typically using class marks for grouped data, and the vertical axis represents the frequencies. The points plotted are (class mark, frequency) for each class interval. These points are then joined by straight line segments to form the polygon.

Thus, Assertion (A) is true.

Reason (R): The class marks represent the midpoints of the class intervals.

The class mark of a class interval is defined as the average of its lower and upper limits, which is its midpoint. It is calculated as $\frac{\text{Lower Limit} + \text{Upper Limit}}{2}$.

Thus, Reason (R) is true.

Now, let's consider if Reason (R) is the correct explanation for Assertion (A).

Assertion (A) describes the process of drawing a frequency polygon using class marks and frequencies. Reason (R) explains what class marks are. The reason why class marks are used on the horizontal axis when plotting a frequency polygon is precisely because they represent the central value or midpoint of each class interval, serving as a representative point for all observations within that interval. Therefore, the fact that class marks are midpoints explains why they are used in the manner described in Assertion (A) to plot the points for a frequency polygon.

Hence, Reason (R) is the correct explanation for Assertion (A).

Based on our evaluation: A is true, R is true, and R is the correct explanation of A.

This corresponds to option (A).

(A) Both A and R are true and R is the correct explanation of A.


Question 18. Case Study: A survey was conducted in a local market to find the number of customers visiting a fruit stall each hour for 10 hours:

15, 18, 20, 15, 25, 18, 15, 22, 20, 25

What is the mode of the number of customers?

(A) 15

(B) 18

(C) 20

(D) 25

Answer:


The mode of a set of observations is the value that occurs most frequently in the data set.

The given data set for the number of customers visiting the fruit stall each hour is:

15, 18, 20, 15, 25, 18, 15, 22, 20, 25

To find the mode, we need to count the frequency of each distinct value in the data set.

Let's list the unique values and their counts:

- Value 15: appears 3 times

- Value 18: appears 2 times

- Value 20: appears 2 times

- Value 22: appears 1 time

- Value 25: appears 2 times

The value that appears most frequently is 15, with a frequency of 3.

Therefore, the mode of the number of customers is 15.

Comparing this result with the given options, we find that 15 corresponds to option (A).

(A) 15


Question 19. Case Study: Refer to the data in Question 18.

What is the median number of customers?

(A) 18

(B) 20

(C) 19

(D) 21

Answer:


To find the median of a data set, we must first arrange the observations in ascending or descending order.

The data set from Question 18 is: 15, 18, 20, 15, 25, 18, 15, 22, 20, 25.

Let's arrange the data in ascending order:

15, 15, 15, 18, 18, 20, 20, 22, 25, 25

Next, we count the number of observations in the data set. There are 10 observations.

The number of observations, $n$, is 10, which is an even number.

When the number of observations is even, the median is the average of the values located at the $\left(\frac{n}{2}\right)$-th position and the $\left(\frac{n}{2}+1\right)$-th position in the sorted data.

The $\left(\frac{n}{2}\right)$-th position is the $\left(\frac{10}{2}\right)$-th = 5th position.

The $\left(\frac{n}{2}+1\right)$-th position is the $\left(\frac{10}{2}+1\right)$-th = (5+1)-th = 6th position.

Now we look at the 5th and 6th values in the sorted data:

15, 15, 15, 18, 18 (5th), 20 (6th), 20, 22, 25, 25

The value at the 5th position is 18.

The value at the 6th position is 20.

The median is the average of these two values:

Median $= \frac{18 + 20}{2}$

Median $= \frac{38}{2}$

Median $= 19$

Therefore, the median number of customers is 19.

Comparing this result with the given options, we find that 19 corresponds to option (C).

(C) 19


Question 20. Which of the following is NOT a measure of central tendency?

(A) Mean

(B) Standard Deviation

(C) Median

(D) Mode

Answer:


Measures of central tendency are statistical values that describe the center or a typical value of a data set.

Let's examine each option:

(A) Mean: The mean is the arithmetic average of the data set. It is a widely used measure of central tendency.

(B) Standard Deviation: Standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. It is a measure of dispersion, not central tendency.

(C) Median: The median is the middle value in a data set that has been sorted in ascending or descending order. It is a measure of central tendency.

(D) Mode: The mode is the value that occurs most frequently in a data set. It is a measure of central tendency.

Based on these definitions, the Standard Deviation is the only option that is not a measure of central tendency. It is a measure of the spread or variability of the data.

Therefore, the correct option is (B).

(B) Standard Deviation


Question 21. The marks obtained by 5 students are 70, 60, 80, 90, 50. If the marks of one student, 50, are incorrectly recorded as 40, what happens to the mean?

(A) It increases.

(B) It decreases.

(C) It remains the same.

(D) Cannot be determined.

Answer:


The arithmetic mean of a set of observations is calculated by summing all the observations and dividing by the total number of observations.

The formula for the mean ($\bar{x}$) is $\bar{x} = \frac{\text{Sum of observations}}{\text{Number of observations}}$.

Initially, the data set is 70, 60, 80, 90, 50.

The sum of the original observations is $70 + 60 + 80 + 90 + 50 = 350$.

The number of observations is 5.

The original mean is $\frac{350}{5} = 70$.

When the marks of one student, 50, are incorrectly recorded as 40, the new data set becomes 70, 60, 80, 90, 40.

The sum of the new observations is $70 + 60 + 80 + 90 + 40 = 340$.

The number of observations is still 5.

The new mean is $\frac{340}{5} = 68$.

Comparing the original mean (70) and the new mean (68), we can see that the mean has decreased.

In general, if one observation in a data set is replaced by a smaller value, the sum of observations will decrease. Since the number of observations remains the same, the mean will decrease.

Therefore, if the marks of 50 are incorrectly recorded as 40 (a smaller value), the mean will decrease.

The correct option is (B).

(B) It decreases.


Question 22. In a grouped frequency distribution with exclusive classes like $10-20, 20-30, \dots$, the upper limit of a class interval is:

(A) Included in the class.

(B) Excluded from the class.

(C) Included in the next class.

(D) Both (B) and (C).

Answer:


In a grouped frequency distribution, class intervals can be either inclusive or exclusive.

Exclusive class intervals, like $10-20, 20-30, 30-40$, etc., are defined such that the upper limit of a class is not included in that class.

For example, in the class interval $10-20$, values from 10 up to (but not including) 20 are included. A value of exactly 20 is not included in the $10-20$ class.

A value that is equal to the upper limit of a class interval in an exclusive distribution is included in the next class interval, where it serves as the lower limit.

For instance, a value of 20 would be included in the $20-30$ class interval, not the $10-20$ class interval.

Therefore, for an exclusive class interval:

- The upper limit is excluded from the current class.

- A value equal to the upper limit is included in the next class (where it is the lower limit).

Looking at the options:

(A) Included in the class - This is incorrect for exclusive classes.

(B) Excluded from the class - This is correct for exclusive classes.

(C) Included in the next class - This is correct for a value equal to the upper limit.

(D) Both (B) and (C) - This accurately describes what happens with the upper limit value; it's excluded from the current class and included in the next.

The question asks what happens to "the upper limit". While option (B) is true about the upper limit itself (it's not part of the interval's inclusion criteria), option (D) better describes the rule for classifying a data point that falls exactly on an upper limit in an exclusive distribution.

The term "upper limit of a class interval is... (B) Excluded from the class" is the fundamental definition of an exclusive class interval.

However, option (D) combines this exclusion with where the value then belongs, which is typically the implication when discussing exclusive intervals for data classification.

Given the options, (D) provides the most complete picture of how exclusive classes function with values at the boundary.

Thus, the correct option is (D).

(D) Both (B) and (C).


Question 23. The data collected by the investigator himself for a definite plan is called:

(A) Secondary data

(B) Primary data

(C) Grouped data

(D) Ungrouped data

Answer:


Data can be classified based on the method of collection.

Primary data is data collected by the investigator or researcher themselves directly from the original source for a specific purpose or plan.

Secondary data is data that has been collected by someone else and is available from existing sources (like publications, databases, etc.).

Grouped data and Ungrouped data refer to the way the data is presented, not how it was collected. Ungrouped data is raw data as collected, while grouped data is organized into classes or categories with frequencies.

The question describes data collected by the investigator himself for a definite plan, which is the definition of primary data.

Therefore, the correct option is (B).

(B) Primary data


Question 24. If the mean of 5 observations $x, x+2, x+4, x+6, x+8$ is 11, find the value of $x$.

(A) 5

(B) 7

(C) 9

(D) 11

Answer:


The given observations are $x, x+2, x+4, x+6, x+8$.

The number of observations is $n = 5$.

The given mean is $11$.

The formula for the arithmetic mean ($\bar{x}$) is the sum of the observations divided by the number of observations:

$\bar{x} = \frac{\text{Sum of observations}}{\text{Number of observations}}$

First, let's find the sum of the given observations:

Sum $= x + (x+2) + (x+4) + (x+6) + (x+8)$

Sum $= x + x+2 + x+4 + x+6 + x+8$

Combine the terms with $x$ and the constant terms:

Sum $= (x+x+x+x+x) + (2+4+6+8)$

Sum $= 5x + 20$

Now, we are given that the mean is 11. We can set up the equation using the mean formula:

$11 = \frac{5x + 20}{5}$

To solve for $x$, multiply both sides of the equation by 5:

$11 \times 5 = 5x + 20$

$55 = 5x + 20$

Subtract 20 from both sides of the equation:

$55 - 20 = 5x$

$35 = 5x$

Divide both sides by 5:

$x = \frac{35}{5}$

$x = 7$

Thus, the value of $x$ is 7.

Comparing this result with the given options, we find that 7 corresponds to option (B).

(B) 7


Question 25. The marks obtained by 15 students in an exam (out of 25) are:

15182016251518202516
1518201625

Represent the data in an ungrouped frequency distribution table. What is the frequency of the mark 15?

(A) $\bcancel{||||}$

(B) $||||$

(C) $|||$

(D) $\bcancel{||||} \quad |$

Answer:


The frequency of a particular observation in a data set is the number of times that observation appears in the data.

The given data set of marks is: 15, 18, 20, 16, 25, 15, 18, 20, 25, 16, 15, 18, 20, 16, 25.

We need to find the frequency of the mark 15. Let's count the occurrences of 15 in the data:

15 appears in the 1st position.

15 appears in the 6th position.

15 appears in the 11th position.

The mark 15 appears 3 times in the data set.

Therefore, the frequency of the mark 15 is 3.

We are asked to represent this frequency using tally marks as shown in the options.

According to the provided format for tally marks:

For a frequency of 3, the tally mark representation is $|||$.

Now, let's look at the options:

(A) $\bcancel{||||}$ represents a frequency of 5.

(B) $||||$ represents a frequency of 4.

(C) $|||$ represents a frequency of 3.

(D) $\bcancel{||||} \quad |$ represents a frequency of $5 + 1 = 6$.

The tally mark representation for a frequency of 3 is $|||$, which corresponds to option (C).

Although the question mentions representing the data in an ungrouped frequency distribution table, the core question is specifically about the frequency of the mark 15 and its representation.

For completeness, an ungrouped frequency distribution table would look like this:

Mark Tally Marks Frequency
15$|||$3
16$|||$3
18$|||$3
20$|||$3
25$|||$3

The frequency of the mark 15 is 3, which is represented by $|||$.

The correct option is (C).

(C) $|||$


Question 26. For data with an even number of observations, the median is:

(A) The average of the two middle observations after sorting.

(B) The exact middle observation.

(C) The most frequent observation.

(D) The difference between the highest and lowest.

Answer:


The median is a measure of central tendency that represents the middle value of a data set when the data is arranged in ascending or descending order.

The method for finding the median depends on whether the number of observations ($n$) is odd or even.

If the number of observations ($n$) is odd, the median is the single middle observation, which is located at the $\left(\frac{n+1}{2}\right)$-th position in the sorted data.

If the number of observations ($n$) is even, there are two middle observations. These are located at the $\left(\frac{n}{2}\right)$-th position and the $\left(\frac{n}{2}+1\right)$-th position in the sorted data.

In the case of an even number of observations, the median is calculated as the arithmetic mean (average) of these two middle observations.

Let's look at the given options:

(A) The average of the two middle observations after sorting. This matches the definition of the median for data with an even number of observations.

(B) The exact middle observation. This is true only for data with an odd number of observations.

(C) The most frequent observation. This is the definition of the mode.

(D) The difference between the highest and lowest. This is the definition of the range.

Therefore, for data with an even number of observations, the median is the average of the two middle observations after sorting.

The correct option is (A).

(A) The average of the two middle observations after sorting.


Question 27. What is the class size of the grouped frequency distribution given by the classes $0-5, 5-10, 10-15, \dots$?

(A) 10

(B) 5

(C) 15

(D) 20

Answer:


The class size (or class width) of a class interval in a frequency distribution is the difference between the upper limit and the lower limit of that class interval.

The given class intervals are $0-5, 5-10, 10-15, \dots$ These are exclusive class intervals.

Let's calculate the class size for one of the intervals, for example, the first interval $0-5$.

Lower limit $= 0$

Upper limit $= 5$

Class Size $= \text{Upper Limit} - \text{Lower Limit}

Class Size $= 5 - 0$

Class Size $= 5$

Let's check another interval, say $5-10$.

Lower limit $= 5$

Upper limit $= 10$

Class Size $= 10 - 5$

Class Size $= 5$

Since the intervals are consistent ($0-5, 5-10, 10-15$, etc.), the class size is uniform across all intervals.

Therefore, the class size of the given grouped frequency distribution is 5.

Comparing this result with the given options, we find that 5 corresponds to option (B).

(B) 5


Question 28. A cumulative frequency table shows:

(A) The frequency of each observation.

(B) The total frequency up to a certain class or observation.

(C) The difference between consecutive frequencies.

(D) The product of observation and frequency.

Answer:


A frequency distribution table shows the frequency (count) of each observation or each class interval.

A cumulative frequency is the running total of frequencies. It is calculated by adding the frequency of the current observation or class interval to the cumulative frequency of the previous observation or class interval.

A cumulative frequency table lists the observations or class intervals along with their cumulative frequencies.

Let's examine the options:

(A) The frequency of each observation. This is shown in a simple frequency distribution table, not a cumulative frequency table. The cumulative frequency table shows the total frequency up to that observation.

(B) The total frequency up to a certain class or observation. This is precisely what cumulative frequency represents and what a cumulative frequency table displays.

(C) The difference between consecutive frequencies. This is not directly shown in a cumulative frequency table; rather, the difference between consecutive cumulative frequencies gives the frequency of that observation or class.

(D) The product of observation and frequency. This calculation is typically used to compute statistics like the mean, not what is displayed in a cumulative frequency table.

Therefore, a cumulative frequency table shows the total frequency up to a certain class or observation.

The correct option is (B).

(B) The total frequency up to a certain class or observation.


Question 29. Case Study: The daily wages (in $\textsf{₹}$) of 10 workers in a small factory are:

300, 350, 320, 300, 400, 350, 380, 300, 320, 350

What is the mode of the daily wages?

(A) $\textsf{₹} 300$

(B) $\textsf{₹} 350$

(C) $\textsf{₹} 320$

(D) There are two modes: $\textsf{₹} 300$ and $\textsf{₹} 350$.

Answer:


The mode of a set of observations is the value that occurs most frequently in the data set.

The given data set for the daily wages (in $\textsf{₹}$) of 10 workers is:

300, 350, 320, 300, 400, 350, 380, 300, 320, 350

To find the mode, we need to count the frequency of each distinct wage amount in the data set.

Let's list the unique wage amounts and their counts:

- Wage $\textsf{₹} 300$: appears 3 times (1st, 4th, 8th occurrences)

- Wage $\textsf{₹} 350$: appears 3 times (2nd, 6th, 10th occurrences)

- Wage $\textsf{₹} 320$: appears 2 times (3rd, 9th occurrences)

- Wage $\textsf{₹} 400$: appears 1 time (5th occurrence)

- Wage $\textsf{₹} 380$: appears 1 time (7th occurrence)

The wage amounts with the highest frequency are $\textsf{₹} 300$ and $\textsf{₹} 350$, both appearing 3 times.

When two or more values have the same highest frequency, the data set is said to be multimodal. If there are exactly two modes, it is bimodal.

In this case, the data set has two modes: $\textsf{₹} 300$ and $\textsf{₹} 350$.

Comparing this result with the given options, we find that option (D) correctly identifies both modes.

(D) There are two modes: $\textsf{₹} 300$ and $\textsf{₹} 350$.


Question 30. Case Study: Refer to the data in Question 29.

What is the mean daily wage?

(A) $\textsf{₹} 340$

(B) $\textsf{₹} 335$

(C) $\textsf{₹} 345$

(D) $\textsf{₹} 350$

Answer:


The mean (or arithmetic average) of a set of observations is calculated by summing all the observations and dividing by the total number of observations.

The given data set for the daily wages (in $\textsf{₹}$) of 10 workers is:

300, 350, 320, 300, 400, 350, 380, 300, 320, 350

The number of observations is $n = 10$.

The formula for the mean ($\bar{x}$) is:

$\bar{x} = \frac{\text{Sum of observations}}{\text{Number of observations}}$

First, let's calculate the sum of the observations:

Sum $= 300 + 350 + 320 + 300 + 400 + 350 + 380 + 300 + 320 + 350$

Sum $= 3370$

Now, calculate the mean:

$\bar{x} = \frac{3370}{10}$

$\bar{x} = 337$

The calculated mean daily wage is $\textsf{₹} 337$.

Comparing the calculated mean with the given options:

(A) $\textsf{₹} 340$

(B) $\textsf{₹} 335$

(C) $\textsf{₹} 345$

(D) $\textsf{₹} 350$

The calculated value $\textsf{₹} 337$ is closest to $\textsf{₹} 335$ among the options.

Therefore, based on the provided options, the closest value is (B).

(B) $\textsf{₹} 335$


Question 31. The empirical relationship between the three measures of central tendency is approximately:

(A) Mean - Mode = 3 (Mean - Median)

(B) Mode - Median = 3 (Mean - Mode)

(C) Median - Mode = 3 (Median - Mean)

(D) Mode = 3 Median - 2 Mean

Answer:


For a moderately skewed distribution, there is an empirical relationship between the three measures of central tendency: Mean, Median, and Mode.

This empirical formula is given by:

Mode $\approx$ 3 Median - 2 Mean

This formula is used when one measure of central tendency needs to be estimated based on the other two, particularly when the distribution is not symmetric.

Let's examine the given options to see which one matches this relationship:

(A) Mean - Mode = 3 (Mean - Median)

Rearranging this equation:

Mean - Mode = 3 Mean - 3 Median

-Mode = 3 Mean - 3 Median - Mean

-Mode = 2 Mean - 3 Median

Mode = -(2 Mean - 3 Median)

Mode = 3 Median - 2 Mean

Option (A) is an algebraic rearrangement of the empirical formula.

(B) Mode - Median = 3 (Mean - Mode)

Mode - Median = 3 Mean - 3 Mode

Mode + 3 Mode = 3 Mean + Median

4 Mode = 3 Mean + Median

Mode = $\frac{3 \text{ Mean} + \text{Median}}{4}$ (Does not match)

(C) Median - Mode = 3 (Median - Mean)

Median - Mode = 3 Median - 3 Mean

-Mode = 3 Median - 3 Mean - Median

-Mode = 2 Median - 3 Mean

Mode = -(2 Median - 3 Mean)

Mode = 3 Mean - 2 Median (Does not match)

(D) Mode = 3 Median - 2 Mean

This is the standard form of the empirical relationship.

Both (A) and (D) represent the same relationship algebraically. However, option (D) is the relationship expressed in its most commonly cited form, where the Mode is explicitly related to the Median and Mean.

Therefore, the correct option is (D).

(D) Mode = 3 Median - 2 Mean


Question 32. Which of the following graphs allows you to find the median of the data?

(A) Bar Graph

(B) Histogram

(C) Frequency Polygon

(D) Ogive (Cumulative Frequency Curve)

Answer:


The median of a data set is the middle value when the data is arranged in order. Graphically, the median can be determined using cumulative frequency curves.

Let's consider the options:

(A) A Bar Graph is used to compare discrete categories or values and their frequencies. It is not directly used to find the median of a distribution.

(B) A Histogram represents the frequency distribution of continuous data using adjacent bars. While it helps visualize the shape of the distribution and locate the modal class, it is not the primary graphical tool for finding the median. Finding the median from a histogram usually involves approximating it within the median class.

(C) A Frequency Polygon is a line graph connecting the class marks and frequencies. Similar to a histogram, it visualizes the distribution but isn't typically used to find the median graphically.

(D) An Ogive, also known as a cumulative frequency curve, plots cumulative frequency against the upper or lower boundaries of class intervals. By using the cumulative frequency total ($N$), the median ($N/2$) can be located on the y-axis, and a horizontal line drawn to the ogive. Dropping a vertical line from the intersection point to the x-axis gives the median value.

Both "less than" and "more than" ogives can be drawn, and their intersection point also yields the median.

Therefore, the graph that allows you to find the median of the data graphically is an Ogive.

The correct option is (D).

(D) Ogive (Cumulative Frequency Curve)


Question 33. If the mean of 10 observations is 20 and the mean of another 15 observations is 24, the mean of all 25 observations is:

(A) 22

(B) 22.4

(C) 22.8

(D) 23

Answer:


We are given information about two sets of observations.

For the first set of observations:

Number of observations ($n_1$) = 10

Mean ($\bar{x}_1$) = 20

The sum of observations in the first set (Sum$_1$) can be calculated using the formula: Sum = Mean $\times$ Number of observations.

Sum$_1 = \bar{x}_1 \times n_1$

Sum$_1 = 20 \times 10$

Sum$_1 = 200$


For the second set of observations:

Number of observations ($n_2$) = 15

Mean ($\bar{x}_2$) = 24

The sum of observations in the second set (Sum$_2$) is:

Sum$_2 = \bar{x}_2 \times n_2$

Sum$_2 = 24 \times 15$

Sum$_2 = 360$


For all 25 observations combined:

The total number of observations ($N$) is the sum of the number of observations in the two sets:

$N = n_1 + n_2$

$N = 10 + 15 = 25$

The total sum of observations (Sum$_{total}$) is the sum of the sums of the two sets:

Sum$_{total} = \text{Sum}_1 + \text{Sum}_2$

Sum$_{total} = 200 + 360 = 560$

The mean of all 25 observations ($\bar{x}_c$) is the total sum of observations divided by the total number of observations:

$\bar{x}_c = \frac{\text{Sum}_{total}}{N}$

$\bar{x}_c = \frac{560}{25}$

To calculate $\frac{560}{25}$, we can simplify the fraction or perform division:

$\frac{560}{25} = \frac{560 \div 5}{25 \div 5} = \frac{112}{5}$

$\frac{112}{5} = 22.4$

Alternatively, using division:

$\begin{array}{r} 22.4\phantom{} \\ 25{\overline{\smash{\big)}\,560.0\phantom{)}}}\\ \underline{-~\phantom{(}50\phantom{0.0)}}\\ 60\phantom{.0)}\\ \underline{-~\phantom{()}50\phantom{.0)}}\\ 100\phantom{)}\\ \underline{-~\phantom{()}100\phantom{)}}\\ 0\phantom{)} \end{array}$

So, the mean of all 25 observations is 22.4.

Comparing this result with the given options, we find that 22.4 corresponds to option (B).

(B) 22.4


Question 34. The data obtained from the internet or published reports is known as:

(A) Primary data

(B) Secondary data

(C) Raw data

(D) Organized data

Answer:


Data can be classified based on its source relative to the investigator.

Primary data is original data collected by the investigator for their specific purpose. Examples include data collected through surveys, experiments, or direct observation conducted by the researcher.

Secondary data is data that has already been collected by someone else and is available from other sources. Examples include data obtained from publications, the internet, databases, archives, or other existing records.

Raw data refers to data that has not been processed or analyzed yet. It can be either primary or secondary data in its original, unorganized form.

Organized data refers to data that has been structured or arranged in a meaningful way, such as in tables or charts. Both primary and secondary data can be organized.

The question describes data obtained from sources like the internet or published reports. This data was collected by others before being accessed by the current investigator.

Therefore, this type of data is known as secondary data.

The correct option is (B).

(B) Secondary data


Question 35. Which measure of central tendency is appropriate for qualitative data like 'favourite colour' or 'preferred mode of transport'?

(A) Mean

(B) Median

(C) Mode

(D) All of the above

Answer:


Measures of central tendency provide a single value that attempts to describe the center of a data set.

Let's consider the applicability of the given measures to qualitative data:

Mean: The mean is calculated by summing numerical values and dividing by the number of values. It is applicable only to quantitative data (data that is numerical and on an interval or ratio scale), as it requires arithmetic operations.

Median: The median is the middle value in a data set when it is arranged in order. It is applicable to quantitative data and also to ordinal qualitative data (qualitative data that can be ordered or ranked), as it requires the ability to sort the data.

Mode: The mode is the value or category that occurs most frequently in the data set. It is applicable to all types of data, including nominal qualitative data (data that represents categories with no inherent order, like 'favourite colour' or 'preferred mode of transport'), as it only requires counting the occurrences of each category.

The examples provided, 'favourite colour' and 'preferred mode of transport', are typical examples of nominal qualitative data. We cannot calculate a mean or median for such data because the values are not numerical and cannot be meaningfully ordered (e.g., "Red" is not numerically greater or less than "Blue", and we cannot average them).

However, we can count how many times each colour or transport mode is chosen. The colour or transport mode chosen by the highest number of people is the mode.

Therefore, the mode is the appropriate measure of central tendency for qualitative data like 'favourite colour' or 'preferred mode of transport'.

The correct option is (C).

(C) Mode


Question 36. The sum of the deviations of observations from their mean is always:

(A) Positive

(B) Negative

(C) Zero

(D) Equal to the variance

Answer:


Let the set of observations be $x_1, x_2, \dots, x_n$.

Let the arithmetic mean of these observations be $\bar{x}$.

The mean is calculated as $\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}$.

This formula can be rearranged to $\sum_{i=1}^{n} x_i = n \bar{x}$.

The deviation of an observation $x_i$ from the mean $\bar{x}$ is given by $(x_i - \bar{x})$.

The sum of the deviations of all observations from the mean is the sum of these individual deviations:

Sum of deviations $= \sum_{i=1}^{n} (x_i - \bar{x})$

Using the property of summation that the sum of differences is the difference of sums, we can write:

$\sum_{i=1}^{n} (x_i - \bar{x}) = \sum_{i=1}^{n} x_i - \sum_{i=1}^{n} \bar{x}$

We know that $\sum_{i=1}^{n} x_i$ is the sum of all observations, which is equal to $n\bar{x}$.

The term $\sum_{i=1}^{n} \bar{x}$ means adding the constant value $\bar{x}$ a total of $n$ times, which is equal to $n \times \bar{x} = n\bar{x}$.

Substituting these values back into the equation for the sum of deviations:

Sum of deviations $= (n\bar{x}) - (n\bar{x})$

Sum of deviations $= 0$

Thus, the sum of the deviations of observations from their mean is always zero.

Comparing this result with the given options, we find that 0 corresponds to option (C).

(C) Zero


Question 37. If the mean and median of a moderately skewed distribution are 10 and 12 respectively, find the approximate mode using the empirical formula.

(A) 14

(B) 16

(C) 10

(D) 8

Answer:


For a moderately skewed distribution, the empirical relationship between the mean, median, and mode is given by the formula:

Mode $\approx$ 3 Median - 2 Mean

We are given:

Mean = 10

Median = 12

Substitute these values into the empirical formula to find the approximate mode:

Mode $\approx 3 \times (\text{Median}) - 2 \times (\text{Mean})$

Mode $\approx 3 \times 12 - 2 \times 10$

Mode $\approx 36 - 20$

Mode $\approx 16$

The approximate mode of the distribution is 16.

Comparing this result with the given options, we find that 16 corresponds to option (B).

(B) 16


Question 38. Which of the following is a correct statement about the relationship between mean, median, and mode?

(A) Mean = Mode in a symmetrical distribution.

(B) Median = Mean in a skewed distribution.

(C) Mode = Median in a skewed distribution.

(D) Mean, Median, and Mode are always equal.

Answer:


The relationship between the mean, median, and mode depends on the shape of the distribution of the data.

1. Symmetrical Distribution:

In a perfectly symmetrical distribution (like the normal distribution or a uniform distribution), the mean, median, and mode are all located at the center and are equal to each other.

Mean = Median = Mode

2. Skewed Distribution:

In a skewed distribution, the mean, median, and mode are generally different from each other.

If the distribution is positively (right) skewed, the tail is longer on the right side, and the order is typically Mode < Median < Mean. The mean is pulled towards the higher values in the tail.

If the distribution is negatively (left) skewed, the tail is longer on the left side, and the order is typically Mean < Median < Mode. The mean is pulled towards the lower values in the tail.

Now let's evaluate the given options:

(A) Mean = Mode in a symmetrical distribution. This statement is consistent with the properties of symmetrical distributions.

(B) Median = Mean in a skewed distribution. This is generally false. In a skewed distribution, the mean and median are typically different. They would only be equal in the specific case of a symmetrical distribution, which is not skewed.

(C) Mode = Median in a skewed distribution. This is also generally false. In a skewed distribution, the mode and median are typically different.

(D) Mean, Median, and Mode are always equal. This is false. They are only equal in perfectly symmetrical distributions. In skewed distributions, they are different.

Based on the properties of distributions, the correct statement is that the Mean equals the Mode (and also the Median) in a symmetrical distribution.

The correct option is (A).

(A) Mean = Mode in a symmetrical distribution.


Question 39. The class intervals $10-20, 21-31, 32-42$ are examples of $\dots$ class intervals.

(A) Exclusive

(B) Inclusive

(C) Continuous

(D) Discrete

Answer:


In grouped frequency distributions, class intervals can be formed in different ways. Two common types are exclusive and inclusive class intervals.

Exclusive Class Intervals: In this type, the upper limit of a class is excluded from the class and is included in the next class. The intervals are typically written like $0-10, 10-20, 20-30, \dots$. There are no gaps between consecutive classes, making them suitable for continuous data.

Inclusive Class Intervals: In this type, both the lower limit and the upper limit are included in the same class interval. The intervals are typically written like $0-10, 11-20, 21-30, \dots$. There is a gap between the upper limit of one class and the lower limit of the next class (e.g., between 10 and 11, 20 and 21).

The given class intervals are $10-20, 21-31, 32-42$.

Let's examine the first interval, $10-20$. This interval includes values from 10 up to and including 20.

The second interval is $21-31$. This interval includes values from 21 up to and including 31.

Notice that there is a gap between the upper limit of the first class (20) and the lower limit of the second class (21). Similarly, there is a gap between 31 and 32.

This structure, where both limits are included within the same class and there are gaps between classes, is characteristic of inclusive class intervals.

Comparing this with the given options:

(A) Exclusive intervals do not have gaps between classes and exclude the upper limit.

(B) Inclusive intervals include both limits and have gaps between classes.

(C) Continuous and (D) Discrete refer to the nature of the data, although exclusive intervals are typically used for continuous data and inclusive for discrete data. The question asks about the intervals themselves.

Based on the structure of the given intervals, they are inclusive.

The correct option is (B).

(B) Inclusive


Question 40. Case Study: A survey recorded the age of 20 patients in a hospital:

25324518553245601825
45325560251845325545

If a grouped frequency distribution table is formed with classes $10-20, 20-30, \dots$, what is the frequency of the class $40-50$?

(A) $||||$

(B) $\bcancel{||||}$

(C) $\bcancel{||||} \quad |$

(D) $\bcancel{||||} \quad ||$

Answer:


We are given a data set of the ages of 20 patients.

The data set is: 25, 32, 45, 18, 55, 32, 45, 60, 18, 25, 45, 32, 55, 60, 25, 18, 45, 32, 55, 45.

We need to find the frequency of the class interval $40-50$ in a grouped frequency distribution with exclusive classes like $10-20, 20-30, \dots$.

In exclusive class intervals, the upper limit is not included in the class. So, the class $40-50$ includes all ages that are greater than or equal to 40 and strictly less than 50 ($40 \leq \text{age} < 50$).

Let's go through the data and count the ages that fall within this range ($40 \leq \text{age} < 50$):

- 25: not in $40-50$

- 32: not in $40-50$

- 45: in $40-50$

- 18: not in $40-50$

- 55: not in $40-50$

- 32: not in $40-50$

- 45: in $40-50$

- 60: not in $40-50$ (it is $\geq 50$)

- 18: not in $40-50$

- 25: not in $40-50$

- 45: in $40-50$

- 32: not in $40-50$

- 55: not in $40-50$

- 60: not in $40-50$

- 25: not in $40-50$

- 18: not in $40-50$

- 45: in $40-50$

- 32: not in $40-50$

- 55: not in $40-50$

- 45: in $40-50$

The ages falling in the class $40-50$ are 45, 45, 45, 45, 45.

There are 5 such ages.

The frequency of the class $40-50$ is 5.

We need to represent the frequency 5 using tally marks as per the given format.

The tally mark for 5 is $\bcancel{||||}$.

Comparing this with the given options:

(A) $||||$ represents 4.

(B) $\bcancel{||||}$ represents 5.

(C) $\bcancel{||||} \quad |$ represents 6.

(D) $\bcancel{||||} \quad ||$ represents 7.

The tally mark representation for a frequency of 5 is $\bcancel{||||}$, which matches option (B).

(B) $\bcancel{||||}$




Short Answer Type Questions

Question 1. What is 'data' in statistics? Give an example of data you might collect in your classroom.

Answer:

In statistics, 'data' refers to a collection of facts, such as numbers, words, measurements, observations, or just a description of things. It is the raw material from which statistical information is derived. Data is collected for a specific purpose and needs to be organized and processed to provide insights or answer questions.


An example of data we might collect in a classroom is the height of each student in centimeters. We could record this information for every student and use it to find the average height, the range of heights, or create a graph to visualize the distribution of heights in the class.

Question 2. What is 'frequency' of an observation? Give an example.

Answer:

In statistics, the 'frequency' of an observation is the number of times that particular observation or value appears in a dataset. It tells us how often a specific data point occurs.


Example:

Consider the marks obtained by 10 students in a test (out of 10):

Marks: $7, 5, 8, 7, 9, 5, 6, 7, 8, 5$

To find the frequency of each mark, we count how many times each distinct mark appears:

  • The mark $5$ appears $3$ times. So, the frequency of $5$ is 3.
  • The mark $6$ appears $1$ time. So, the frequency of $6$ is 1.
  • The mark $7$ appears $3$ times. So, the frequency of $7$ is 3.
  • The mark $8$ appears $2$ times. So, the frequency of $8$ is 2.
  • The mark $9$ appears $1$ time. So, the frequency of $9$ is 1.

The frequency of each observation ($5, 6, 7, 8, 9$) indicates how many students scored that particular mark.

Question 3. What is the purpose of an ungrouped frequency distribution table?

Answer:

An ungrouped frequency distribution table is a way to organize raw data, especially when the number of distinct values in the data set is relatively small.


The main purposes of an ungrouped frequency distribution table are:

1. To organize data: It systematically lists each distinct value or observation from the dataset.

2. To show the frequency: For each distinct value, it shows how many times it occurs in the dataset (its frequency).

3. To summarize data: It provides a concise summary of the data, making it easier to see the distribution of values at a glance.

4. To make data understandable: Raw, unorganized data can be difficult to interpret. The table makes the data easier to understand and analyze.

In essence, it converts raw data into a structured format that highlights the occurrence pattern of each individual data point.

Question 4. What is a 'class interval' in a grouped frequency distribution? Give an example.

Answer:

In a grouped frequency distribution, when the data set is large and has a wide range of values, it is often impractical to list the frequency of each individual observation. Instead, the data is grouped into ranges of values.


A 'class interval' is one such range into which the data is grouped. Each class interval has a lower limit and an upper limit, and all observations falling within that range are counted together. The use of class intervals helps in summarizing large amounts of data into a more manageable form, making it easier to analyze the distribution pattern.


Example:

Suppose we have the marks of 50 students in a test (out of 100).

Instead of listing the frequency of each individual mark (e.g., frequency of 45, frequency of 62, etc.), we can group the marks into class intervals.

A possible set of class intervals could be:

  • $0 - 20$
  • $20 - 40$
  • $40 - 60$
  • $60 - 80$
  • $80 - 100$

In the class interval $40 - 60$, for instance, $40$ is the lower limit and $60$ is the upper limit. All students whose marks are between $40$ (inclusive, depending on convention) and less than $60$ would be counted in this interval.

A grouped frequency distribution table would then show the number of students (frequency) falling into each of these intervals.

Question 5. In the class interval $30-40$, what are the lower limit, upper limit, and class mark?

Answer:

For the class interval $30-40$:


1. The lower limit is the smallest value in the interval, which is $30$.


2. The upper limit is the largest value in the interval, which is $40$.


3. The class mark (also called the midpoint) is the average of the lower and upper limits. It is calculated using the formula:

Class Mark $ = \frac{\text{Lower Limit} + \text{Upper Limit}}{2}$

For the interval $30-40$, the class mark is:

$ = \frac{30 + 40}{2}$

$ = \frac{70}{2}$

$ = 35$

Question 6. What is a cumulative frequency distribution table? What is the difference between 'less than' cumulative frequency and 'more than' cumulative frequency?

Answer:

A cumulative frequency distribution table is a table that shows the cumulative frequency of each class interval. The cumulative frequency for a class interval is the sum of the frequencies of that class and all classes below it (or above it, depending on the type of cumulative frequency).

The purpose of a cumulative frequency distribution table is to show the running total of frequencies, which helps in determining the number or percentage of observations that fall below or above a certain value.


There are two main types of cumulative frequency:

1. 'Less than' cumulative frequency: This shows the total frequency of all observations that are less than the upper limit of a given class interval. It is calculated by successively adding the frequencies from the lowest class interval upwards.

2. 'More than' cumulative frequency: This shows the total frequency of all observations that are greater than or equal to the lower limit of a given class interval. It is calculated by successively subtracting the frequencies from the total frequency downwards, or by summing frequencies from the highest class interval downwards.


Difference:

  • 'Less than' cumulative frequency accumulates frequencies from the bottom up, referring to values below the upper class limits. The cumulative frequency of the last class equals the total number of observations.
  • 'More than' cumulative frequency accumulates frequencies from the top down (conceptually), referring to values above or equal to the lower class limits. The cumulative frequency of the first class (or the value corresponding to the lowest lower limit) equals the total number of observations.

Question 7. What is a histogram? Why are there no gaps between the bars in a histogram?

Answer:

A histogram is a graphical representation used to display the frequency distribution of continuous data. It consists of a set of adjacent rectangular bars, where the base of each bar represents a class interval on the horizontal axis, and the height of each bar represents the frequency of observations falling within that class interval on the vertical axis.


There are no gaps between the bars in a histogram because it is used for continuous data. The class intervals are contiguous, meaning one interval ends exactly where the next one begins (e.g., $0-10$, $10-20$, $20-30$, etc.). The upper boundary of one class interval serves as the lower boundary of the subsequent class interval. Since the data is continuous, there are no gaps between the values covered by the intervals, and this is reflected visually by having no gaps between the bars representing these intervals.

Question 8. What is a frequency polygon? How is it related to a histogram?

Answer:

A frequency polygon is a line graph used to display the frequency distribution of continuous data. It is constructed by plotting points corresponding to the class marks (midpoints) of each class interval on the horizontal axis and the frequencies of those intervals on the vertical axis. These points are then connected by straight lines.


Relation to a histogram:

A frequency polygon can be drawn from a histogram. To do this:

1. Find the class mark (midpoint) of the top of each bar in the histogram.

2. Plot a point at the class mark and the corresponding frequency for each bar.

3. To make the polygon close on the horizontal axis and represent the total frequency, plot two additional points: one at the class mark of the class immediately preceding the first class (with frequency 0) and one at the class mark of the class immediately succeeding the last class (with frequency 0).

4. Connect all these plotted points with straight line segments.

Alternatively, a frequency polygon can be drawn directly from a frequency distribution table without first constructing a histogram, by plotting class marks against frequencies.


Both histograms and frequency polygons are used to visualize the shape and distribution of continuous data, but the frequency polygon provides a smoother representation, which can be useful for comparing distributions of different datasets.

Question 9. Define the 'mean' of ungrouped data. Write the formula for calculating the mean.

Answer:

The mean (or arithmetic mean) of ungrouped data is a measure of central tendency. It is calculated by summing all the individual observations in the dataset and then dividing the sum by the total number of observations.

It represents the average value of the data points and is sensitive to extreme values (outliers).


The formula for calculating the mean of ungrouped data is:

Mean $ = \frac{\text{Sum of all observations}}{\text{Total number of observations}}$

Using mathematical notation, if we have $n$ observations denoted by $x_1, x_2, x_3, ..., x_n$, the mean (often denoted by $\overline{x}$) is given by:

$\overline{x} = \frac{x_1 + x_2 + x_3 + ... + x_n}{n}$

This can also be written using summation notation as:

$\overline{x} = \frac{\sum_{i=1}^{n} x_i}{n}$

Question 10. Find the mean of the following data: $10, 15, 20, 25, 30$.

Answer:

The given data is ungrouped data.

The formula for the mean of ungrouped data is:

Mean $ = \frac{\text{Sum of all observations}}{\text{Total number of observations}}$


Given observations: $10, 15, 20, 25, 30$


Sum of all observations $ = 10 + 15 + 20 + 25 + 30$

$ = 100$


Total number of observations $n = 5$


Mean $ = \frac{100}{5}$

$ = 20$


The mean of the given data is $20$.

Question 11. Define the 'median' of ungrouped data. How is it found for an odd number of observations?

Answer:

The median of ungrouped data is the middle value of the dataset when the observations are arranged in either ascending or descending order. It is a measure of central tendency that divides the data into two equal halves, meaning $50\%$ of the observations are less than or equal to the median, and $50\%$ are greater than or equal to the median.


Finding the Median for an Odd Number of Observations:

When the total number of observations ($n$) in the dataset is odd, the median is the single middle value after the data has been ordered.

The steps to find the median are:

1. Arrange the observations in ascending or descending order.

2. Identify the position of the middle observation. The position of the median for an odd number of observations is given by the formula:

$ \text{Median Position} = \frac{n+1}{2} $

3. The median is the observation located at this position in the ordered list.


Example:

Consider the data: $12, 15, 9, 20, 18$

1. Arrange in ascending order: $9, 12, 15, 18, 20$

2. The total number of observations is $n = 5$, which is odd.

3. The position of the median is $\frac{5+1}{2} = \frac{6}{2} = 3$rd position.

4. The observation at the 3rd position in the ordered list ($9, 12, \textbf{15}, 18, 20$) is $15$.

Therefore, the median of this dataset is $15$.

Question 12. Find the median of the data: $7, 9, 12, 15, 18, 20, 25$.

Answer:

The given data is: $7, 9, 12, 15, 18, 20, 25$.


The data is already arranged in ascending order.


The total number of observations is $n = 7$.


Since the number of observations ($n=7$) is odd, the median is the value at the position $\frac{n+1}{2}$.

Position of Median $ = \frac{7+1}{2} = \frac{8}{2} = 4$th position.


In the ordered data ($7, 9, 12, 15, 18, 20, 25$), the observation at the 4th position is $15$.


Therefore, the median of the data is $15$.

Question 13. How is the median found for an even number of observations?

Answer:

When the total number of observations ($n$) in an ungrouped dataset is even, there are two middle values after the data has been ordered. The median is calculated as the average (mean) of these two middle values.


The steps to find the median for an even number of observations are:

1. Arrange the observations in either ascending or descending order.

2. Identify the positions of the two middle observations. For an even number of observations $n$, the two middle positions are $\frac{n}{2}$ and $\frac{n}{2} + 1$.

3. Find the values of the observations at these two middle positions in the ordered list.

4. Calculate the median by taking the average (mean) of these two middle values.

The formula for the median is:

$ \text{Median} = \frac{\text{Value at position } (\frac{n}{2}) + \text{Value at position } (\frac{n}{2} + 1)}{2} $


Example:

Consider the data: $10, 14, 8, 22, 16, 12$

1. Arrange in ascending order: $8, 10, 12, 14, 16, 22$

2. The total number of observations is $n = 6$, which is even.

3. The positions of the two middle values are $\frac{6}{2} = 3$rd and $\frac{6}{2} + 1 = 4$th.

4. The values at the 3rd and 4th positions in the ordered list ($8, 10, \textbf{12}, \textbf{14}, 16, 22$) are $12$ and $14$.

5. Calculate the median:

$ \text{Median} = \frac{12 + 14}{2} = \frac{26}{2} = 13 $

Therefore, the median of this dataset is $13$.

Question 14. Find the median of the data: $4, 6, 8, 10, 12, 14$.

Answer:

The given data is: $4, 6, 8, 10, 12, 14$.


The data is already arranged in ascending order.


The total number of observations is $n = 6$.


Since the number of observations ($n=6$) is even, the median is the average of the values at positions $\frac{n}{2}$ and $\frac{n}{2} + 1$.

Position 1 $ = \frac{6}{2} = 3$rd position.

Position 2 $ = \frac{6}{2} + 1 = 3 + 1 = 4$th position.


The value at the 3rd position is $8$.

The value at the 4th position is $10$.


Median $ = \frac{\text{Value at 3rd position} + \text{Value at 4th position}}{2}$

$ = \frac{8 + 10}{2}$

$ = \frac{18}{2}$

$ = 9$


Therefore, the median of the data is $9$.

Question 15. Define the 'mode' of ungrouped data. Can a data set have more than one mode?

Answer:

The mode of ungrouped data is the observation that appears most frequently in the dataset. In other words, it is the value that has the highest frequency.

The mode is another measure of central tendency and is useful for finding the most common value in a dataset, particularly for categorical or discrete data.


Yes, a data set can have more than one mode.

  • If two or more observations have the same highest frequency, then the dataset is considered to be multimodal.
  • If there are exactly two modes, it is called bimodal.
  • If there are more than two modes, it is called multimodal.
  • If all observations in a dataset have the same frequency (e.g., each observation appears only once), then the dataset has no mode.


Examples:

  • Data: $2, 3, 3, 4, 5$. The value $3$ appears most frequently (twice). The mode is $3$. (Unimodal)
  • Data: $2, 3, 3, 4, 4, 5$. The values $3$ and $4$ both appear twice, which is the highest frequency. The modes are $3$ and $4$. (Bimodal)
  • Data: $2, 2, 3, 3, 4, 4$. The values $2, 3,$ and $4$ all appear twice. The modes are $2, 3,$ and $4$. (Multimodal)
  • Data: $2, 3, 4, 5, 6$. Each value appears only once. There is no mode.

Question 16. Find the mode of the data: $2, 3, 5, 2, 4, 2, 3, 5, 2, 1, 5$.

Answer:

To find the mode of ungrouped data, we need to find the observation that occurs most frequently.


Let's list the distinct observations and count their frequencies:

  • Observation $1$: appears $1$ time.
  • Observation $2$: appears $4$ times.
  • Observation $3$: appears $2$ times.
  • Observation $4$: appears $1$ time.
  • Observation $5$: appears $3$ times.

The observation with the highest frequency is $2$, which appears $4$ times.


Therefore, the mode of the given data is $2$.

Question 17. The mean of 5 numbers is 30. If four of the numbers are $25, 28, 32, 35$, find the fifth number.

Answer:

Let the five numbers be $n_1, n_2, n_3, n_4,$ and $n_5$.

We are given that four of the numbers are $25, 28, 32, 35$. Let these be $n_1, n_2, n_3, n_4$.

Let the fifth number, $n_5$, be $x$.


The number of observations is $N = 5$.

The mean of the 5 numbers is given as $30$.


The formula for the mean is:

Mean $ = \frac{\text{Sum of all observations}}{\text{Total number of observations}}$

$ 30 = \frac{n_1 + n_2 + n_3 + n_4 + n_5}{5} $

$ 30 = \frac{25 + 28 + 32 + 35 + x}{5} $


Sum of the four given numbers $ = 25 + 28 + 32 + 35$

$ = 120 $


Substitute the sum into the mean formula:

$ 30 = \frac{120 + x}{5} $


Multiply both sides by $5$:

$ 30 \times 5 = 120 + x $

$ 150 = 120 + x $


Subtract $120$ from both sides to find $x$:

$ x = 150 - 120 $

$ x = 30 $


The fifth number is $30$.

Question 18. Give one situation where the median is a more appropriate measure of central tendency than the mean.

Answer:

The median is a more appropriate measure of central tendency than the mean when the dataset contains extreme values or outliers. These extreme values can significantly skew the mean, pulling it away from the typical or central value of the majority of the data.


The median, being the middle value in an ordered dataset, is not affected by the magnitude of the extreme values, only by their position in the ordered list. Therefore, the median provides a better representation of the "center" of the data in the presence of outliers.


Situation Example:

Consider the salaries of a small group of employees in a company:

Salaries (\textsf{₹}): $25,000, 30,000, 35,000, 40,000, 50,000, 60,000, 5,000,000$


Let's calculate the mean and the median:

Mean:

Sum of salaries $ = 25000 + 30000 + 35000 + 40000 + 50000 + 60000 + 5000000 = 5240000$

Number of employees $n = 7$

Mean $ = \frac{5240000}{7} \approx 748571.43$

The mean salary is approximately \textsf{₹}748,571.43.


Median:

Arrange the salaries in ascending order:

$25,000, 30,000, 35,000, 40,000, 50,000, 60,000, 5,000,000$

Number of observations $n=7$ (odd).

Median is the value at the $\frac{n+1}{2} = \frac{7+1}{2} = 4$th position.

The salary at the 4th position is \textsf{₹}40,000.

The median salary is \textsf{₹}40,000.


In this example, one extremely high salary (\textsf{₹}5,000,000) significantly inflated the mean. The mean salary of \textsf{₹}748,571.43 does not accurately represent the typical salary for most employees in this group. The median salary of \textsf{₹}40,000 is a much better indicator of the central or typical earning for this group, as most salaries are clustered around this value.

Question 19. Give one situation where the mode is a more appropriate measure of central tendency than the mean or median.

Answer:

The mode is a more appropriate measure of central tendency when dealing with categorical data or when you want to find the most frequently occurring item, category, or characteristic in a dataset. In such cases, the mean and median may not be meaningful or even possible to calculate.


The mean is only applicable to numerical data, and the median requires the data to be ordered, which is not always possible or meaningful for categorical data. The mode, however, can be found for any type of data (numerical or categorical) and directly tells us the most common observation.


Situation Example:

Suppose a survey was conducted among students to find out their favorite color. The data collected might look like this:

Colors: Red, Blue, Green, Red, Yellow, Blue, Red, Green, Blue, Red


We can find the frequency of each color:

  • Red: $4$ times
  • Blue: $3$ times
  • Green: $2$ times
  • Yellow: $1$ time

In this dataset, the concept of a mean or median color is meaningless (you cannot add colors or find a middle color in an ordered list). However, we can easily find the mode, which is the color that appears most often.

The color 'Red' appears most frequently (4 times).

Therefore, the mode is Red.


The mode tells us the most popular or preferred color among the surveyed students, which is exactly the kind of information needed in this situation. This makes the mode the most appropriate measure of central tendency here.

Question 20. The marks obtained by students in a test are: 15, 18, 16, 15, 17, 15, 18, 16, 15. Find the mean, median, and mode of these marks.

Answer:

Given Data:

The marks obtained by students are: $15, 18, 16, 15, 17, 15, 18, 16, 15$.

The number of observations is $n = 9$.


To Find:

Mean, Median, and Mode of the given data.


Solution:

1. Mean:

The mean of ungrouped data is given by the formula:

Mean $ = \frac{\text{Sum of all observations}}{\text{Total number of observations}} $

$ = \frac{15 + 18 + 16 + 15 + 17 + 15 + 18 + 16 + 15}{9} $

Sum of observations $ = 145 $

$ = \frac{145}{9} $

The mean of the data is $\frac{145}{9}$ (or approximately $16.11$).


2. Median:

To find the median, we first arrange the data in ascending order:

$15, 15, 15, 15, 16, 16, 17, 18, 18$

The number of observations is $n = 9$, which is odd.

The median is the value at the $\frac{n+1}{2}$ position.

Position of Median $ = \frac{9+1}{2} = \frac{10}{2} = 5$th position.

The value at the 5th position in the ordered list is $16$.

The median of the data is $16$.


3. Mode:

The mode is the observation that appears most frequently in the data.

Let's count the frequency of each observation:

  • $15$: appears $4$ times
  • $16$: appears $2$ times
  • $17$: appears $1$ time
  • $18$: appears $2$ times

The observation $15$ has the highest frequency ($4$).

The mode of the data is $15$.

Question 21. What is the empirical relationship between mean, median, and mode?

Answer:

The empirical relationship between mean, median, and mode is an approximate formula that holds true for moderately skewed distributions (distributions that are not perfectly symmetrical but are not heavily skewed either). It is not a mathematically derived theorem but an observation based on many datasets.


The relationship is given by the formula:

$ \text{Mode} \approx 3 \times \text{Median} - 2 \times \text{Mean} $


This formula suggests that if you know any two of the three measures (mean, median, or mode) for a moderately skewed distribution, you can estimate the third one using this relationship.

For symmetrical distributions (like the normal distribution), the mean, median, and mode are equal, so the relationship holds perfectly ($ \text{Mode} = 3 \times \text{Mean} - 2 \times \text{Mean} = \text{Mean}$). For skewed distributions, this relationship provides a reasonable approximation.

Question 22. Can the mean, median, and mode of a data set be the same? Give an example.

Answer:

Yes, the mean, median, and mode of a data set can be the same. This happens when the data distribution is symmetrical, particularly in the case of a normal distribution.


Example:

Consider the following dataset: $5, 5, 5, 5, 5$


Mean:

Sum of observations $ = 5 + 5 + 5 + 5 + 5 = 25$

Number of observations $n = 5$

Mean $ = \frac{25}{5} = 5$


Median:

The data is already in order: $5, 5, 5, 5, 5$

Number of observations $n = 5$ (odd)

Median position $ = \frac{5+1}{2} = 3$rd position.

The value at the 3rd position is $5$.

Median $ = 5$


Mode:

The observation $5$ appears $5$ times, which is the highest frequency.

Mode $ = 5$


In this example, the mean, median, and mode are all equal to $5$. This is a simple example of a symmetrical distribution where all observations are the same.


Another example with more variation:

Data: $2, 3, 4, 5, 6$

Mean $ = \frac{2+3+4+5+6}{5} = \frac{20}{5} = 4$

Ordered data: $2, 3, \textbf{4}, 5, 6$. Median (3rd position) $ = 4$

Mode: No single value appears most frequently, so there is no mode in the traditional sense, or sometimes it's considered that every value is a mode. However, if we slightly modify the data to have a clear mode at the center:

Data: $2, 3, 4, 4, 4, 5, 6$

Mean $ = \frac{2+3+4+4+4+5+6}{7} = \frac{28}{7} = 4$

Ordered data: $2, 3, 4, \textbf{4}, 4, 5, 6$. Median ($\frac{7+1}{2}=4$th position) $ = 4$

Mode: $4$ appears $3$ times, which is the highest frequency. Mode $ = 4$.

In this modified example, Mean = Median = Mode = 4.

Question 23. The class intervals in a frequency distribution are $10-20, 20-30, 30-40, ...$. Is this an inclusive or exclusive method of classification?

Answer:

The given class intervals are $10-20, 20-30, 30-40, ...$


In this method of classification, the upper limit of a class interval is the lower limit of the next class interval.

For example, in the interval $10-20$, any observation exactly equal to the upper limit ($20$) is typically not included in this class but is included in the next class interval ($20-30$). The values included in the interval $10-20$ are those greater than or equal to $10$ and strictly less than $20$ (i.e., $10 \le x < 20$).


This method, where the upper boundary is excluded from the interval and included in the next one, is called the exclusive method of classification.


In contrast, the inclusive method uses intervals like $10-19, 20-29, 30-39, ...$, where both the lower and upper limits are included in the respective class interval.


Therefore, the classification $10-20, 20-30, 30-40, ...$ is the exclusive method of classification.

Question 24. What information can you obtain from the height of a bar in a histogram?

Answer:

In a histogram, the height of a bar represents the frequency of the data that falls within the specific class interval corresponding to the base of that bar.


More precisely, the height of the bar indicates how many observations or data points from the dataset are included in the range of values defined by the class interval of that bar.


For histograms with equal class widths, the area of the bar is proportional to the frequency, but the height is directly equal to the frequency. For histograms with unequal class widths, the height of the bar is adjusted (often representing frequency density), but it still provides information related to the concentration of data within the interval.

Question 25. How do you calculate the class mark of a class interval?

Answer:

The class mark (or midpoint) of a class interval is calculated as the average of its lower limit and upper limit.


The formula for calculating the class mark is:

$ \text{Class Mark} = \frac{\text{Lower Limit of the class interval} + \text{Upper Limit of the class interval}}{2} $


Example:

For the class interval $50-60$:

Lower Limit $ = 50$

Upper Limit $ = 60$

Class Mark $ = \frac{50 + 60}{2} = \frac{110}{2} = 55$


The class mark represents the central value of the class interval and is used, for example, when calculating the mean of grouped data or when plotting a frequency polygon.



Long Answer Type Questions

Question 1. The following data shows the marks obtained by 20 students in a Mathematics test (out of 50): $35, 40, 28, 32, 45, 30, 40, 28, 35, 42, 38, 40, 45, 32, 30, 35, 40, 42, 38, 35$. Prepare an ungrouped frequency distribution table for this data.

Answer:

To prepare an ungrouped frequency distribution table, we need to list each distinct mark obtained by the students and count how many times each mark appears in the given data.


The given marks are: $35, 40, 28, 32, 45, 30, 40, 28, 35, 42, 38, 40, 45, 32, 30, 35, 40, 42, 38, 35$.


Let's count the frequency of each distinct mark:

  • Mark 28: Appears 2 times.
  • Mark 30: Appears 2 times.
  • Mark 32: Appears 2 times.
  • Mark 35: Appears 4 times.
  • Mark 38: Appears 2 times.
  • Mark 40: Appears 4 times.
  • Mark 42: Appears 2 times.
  • Mark 45: Appears 2 times.

Now we can organize this information into an ungrouped frequency distribution table:

Marks Frequency
282
302
322
354
382
404
422
452
Total20

Question 2. The electricity bills (in $\textsf{₹}$) of 25 houses in a locality are given below: $350, 400, 380, 420, 350, 400, 420, 380, 450, 500, 480, 450, 400, 380, 350, 420, 450, 480, 500, 400, 420, 450, 380, 350, 400$. Construct a grouped frequency distribution table with class intervals like $350-400, 400-450$, etc.

Answer:

We need to construct a grouped frequency distribution table for the given electricity bills using the exclusive method of classification with class intervals $350-400, 400-450$, and so on. This means values equal to the upper limit are included in the next interval.


The given data consists of 25 electricity bills:

$350, 400, 380, 420, 350, 400, 420, 380, 450, 500, 480, 450, 400, 380, 350, 420, 450, 480, 500, 400, 420, 450, 380, 350, 400$


Let's determine the class intervals and count the frequency for each interval:

  • $350 - 400$: Includes bills from $\textsf{₹}350$ up to (but not including) $\textsf{₹}400$.
    Bills: $350, 380, 350, 380, 380, 350, 380, 350$
    Frequency: 8
  • $400 - 450$: Includes bills from $\textsf{₹}400$ up to (but not including) $\textsf{₹}450$.
    Bills: $400, 420, 400, 420, 400, 420, 400, 420, 400$
    Frequency: 9
  • $450 - 500$: Includes bills from $\textsf{₹}450$ up to (but not including) $\textsf{₹}500$.
    Bills: $450, 480, 450, 450, 480, 450$
    Frequency: 6
  • $500 - 550$: Includes bills from $\textsf{₹}500$ up to (but not including) $\textsf{₹}550$.
    Bills: $500, 500$
    Frequency: 2

Now we can construct the grouped frequency distribution table:

Electricity Bill ($\textsf{₹}$) Number of Houses (Frequency)
$350 - 400$8
$400 - 450$9
$450 - 500$6
$500 - 550$2
Total25

Question 3. For the grouped frequency distribution table constructed in Question 2, prepare a cumulative frequency distribution table (both 'less than' and 'more than' types).

Answer:

The grouped frequency distribution table from Question 2 is:

Electricity Bill ($\textsf{₹}$) Frequency
$350 - 400$8
$400 - 450$9
$450 - 500$6
$500 - 550$2
Total25

'Less Than' Cumulative Frequency Distribution Table:

This table shows the number of houses with electricity bills less than the upper limit of each class interval.

Electricity Bill ($\textsf{₹}$) Less Than Cumulative Frequency
Less than 4008
Less than 450$8 + 9 = 17$
Less than 500$17 + 6 = 23$
Less than 550$23 + 2 = 25$

'More Than' Cumulative Frequency Distribution Table:

This table shows the number of houses with electricity bills greater than or equal to the lower limit of each class interval.

Electricity Bill ($\textsf{₹}$) More Than Cumulative Frequency
More than or equal to 350$8 + 9 + 6 + 2 = 25$
More than or equal to 400$9 + 6 + 2 = 17$
More than or equal to 450$6 + 2 = 8$
More than or equal to 500$2$

Question 4. Draw a histogram for the grouped frequency distribution table created in Question 2. Remember to label the axes and choose an appropriate scale. What information does the height of each bar represent?

Answer:

To draw a histogram for the grouped frequency distribution table from Question 2, we use the class intervals on the horizontal axis and the frequencies on the vertical axis.


The grouped frequency distribution table is:

Electricity Bill ($\textsf{₹}$) Frequency (Number of Houses)
$350 - 400$8
$400 - 450$9
$450 - 500$6
$500 - 550$2

Steps to draw the Histogram:

1. Draw the horizontal axis (x-axis) and the vertical axis (y-axis).

2. Mark the class boundaries ($350, 400, 450, 500, 550$) on the horizontal axis. Label the horizontal axis as "Electricity Bill ($\textsf{₹}$)".

3. Choose a suitable scale for the vertical axis to represent the frequencies (0, 1, 2, ..., up to at least 9). Label the vertical axis as "Frequency (Number of Houses)".

4. Draw rectangles (bars) above each class interval on the horizontal axis. The width of each bar is the class width ($400-350=50$, $450-400=50$, etc., which is uniform). The height of each bar should be equal to the frequency of the corresponding class interval.

  • Draw a bar for the interval $350-400$ with height $8$.
  • Draw a bar for the interval $400-450$ with height $9$.
  • Draw a bar for the interval $450-500$ with height $6$.
  • Draw a bar for the interval $500-550$ with height $2$.

Since this is a grouped frequency distribution using the exclusive method, the bars will be adjacent with no gaps between them, as the upper limit of one class is the lower limit of the next.


Information from the Height of Each Bar:

In a histogram with equal class widths, the height of each bar directly represents the frequency of the observations falling within that specific class interval.

For example, the height of the bar for the interval $350-400$ is 8, which means there are 8 houses with electricity bills between $\textsf{₹}350$ and less than $\textsf{₹}400$. The height of the bar for $400-450$ is 9, indicating 9 houses have bills between $\textsf{₹}400$ and less than $\textsf{₹}450$, and so on.

Question 5. Explain the steps to draw a frequency polygon for a grouped frequency distribution. Draw a frequency polygon for the data given in Question 2.

Answer:

A frequency polygon is a graphical representation of a grouped frequency distribution. It is drawn by plotting points corresponding to the class marks of the intervals and their frequencies, and then connecting these points with line segments.


Steps to Draw a Frequency Polygon from a Grouped Frequency Distribution:

1. Find the Class Marks: Calculate the class mark (midpoint) for each class interval using the formula: Class Mark $ = \frac{\text{Lower Limit} + \text{Upper Limit}}{2}$.

2. Plot Points: On a graph, plot points where the horizontal axis represents the class marks and the vertical axis represents the frequencies of the corresponding class intervals.

3. Connect Points: Connect the plotted points with straight line segments.

4. Close the Polygon: To make the frequency polygon a closed figure and have the total area represent the total frequency, plot two additional points on the horizontal axis (with a frequency of 0). These points are the class marks of hypothetical class intervals immediately preceding the first class and immediately succeeding the last class, assuming they have zero frequency. Connect the first plotted point to the class mark of the preceding interval and the last plotted point to the class mark of the succeeding interval.


Drawing a Frequency Polygon for the Data from Question 2:

The grouped frequency distribution table from Question 2 is:

Electricity Bill ($\textsf{₹}$) Frequency
$350 - 400$8
$400 - 450$9
$450 - 500$6
$500 - 550$2

1. Find the Class Marks:

  • For $350-400$: Class Mark $ = \frac{350+400}{2} = \frac{750}{2} = 375$
  • For $400-450$: Class Mark $ = \frac{400+450}{2} = \frac{850}{2} = 425$
  • For $450-500$: Class Mark $ = \frac{450+500}{2} = \frac{950}{2} = 475$
  • For $500-550$: Class Mark $ = \frac{500+550}{2} = \frac{1050}{2} = 525$

2. Determine points to plot:

Based on the class marks and frequencies, the points are: $(375, 8), (425, 9), (475, 6), (525, 2)$.


3. Determine points to close the polygon:

The class width is $400 - 350 = 50$.

  • Preceding class mark: $375 - 50 = 325$. Point: $(325, 0)$
  • Succeeding class mark: $525 + 50 = 575$. Point: $(575, 0)$

4. Draw the Polygon:

On a graph:

  • Draw the horizontal axis (x-axis) and label it "Electricity Bill ($\textsf{₹}$)". Mark the class marks $325, 375, 425, 475, 525, 575$ on this axis.
  • Draw the vertical axis (y-axis) and label it "Frequency (Number of Houses)". Choose an appropriate scale (e.g., 0 to 10).
  • Plot the points: $(325, 0), (375, 8), (425, 9), (475, 6), (525, 2), (575, 0)$.
  • Connect the points $(325, 0)$ to $(375, 8)$, $(375, 8)$ to $(425, 9)$, $(425, 9)$ to $(475, 6)$, $(475, 6)$ to $(525, 2)$, and $(525, 2)$ to $(575, 0)$ with straight line segments.

This forms the frequency polygon for the given data.

Question 6. The scores obtained by 10 students in a test are: $12, 15, 18, 11, 15, 17, 13, 15, 19, 14$. Find the mean, median, and mode of these scores. Explain how you calculated each measure.

Answer:

Given Data:

The scores obtained by 10 students are: $12, 15, 18, 11, 15, 17, 13, 15, 19, 14$.

The number of observations is $n = 10$.


To Find:

Mean, Median, and Mode.


Solution:

1. Mean:

The mean is the average of all observations. The formula for the mean of ungrouped data is:

Mean $ = \frac{\text{Sum of all observations}}{\text{Total number of observations}} $

Sum of observations $ = 12 + 15 + 18 + 11 + 15 + 17 + 13 + 15 + 19 + 14 $

$ = 149 $

Total number of observations $n = 10$.

Mean $ = \frac{149}{10} $

$ = 14.9 $

The mean score is $14.9$.


2. Median:

The median is the middle value when the data is arranged in order. Since the number of observations ($n=10$) is even, the median is the average of the two middle values.

First, arrange the data in ascending order:

$11, 12, 13, 14, 15, 15, 15, 17, 18, 19$

The two middle positions are $\frac{n}{2}$ and $\frac{n}{2} + 1$, which are $\frac{10}{2} = 5$th and $\frac{10}{2} + 1 = 6$th positions.

The value at the 5th position is $15$.

The value at the 6th position is $15$.

Median $ = \frac{\text{Value at 5th position} + \text{Value at 6th position}}{2} $

$ = \frac{15 + 15}{2} $

$ = \frac{30}{2} $

$ = 15 $

The median score is $15$.


3. Mode:

The mode is the observation that appears most frequently in the data.

Let's count the frequency of each score:

  • 11: 1 time
  • 12: 1 time
  • 13: 1 time
  • 14: 1 time
  • 15: 3 times
  • 17: 1 time
  • 18: 1 time
  • 19: 1 time

The score $15$ appears most frequently ($3$ times).

The mode score is $15$.

Question 7. The runs scored by a batsman in 11 innings are: $60, 45, 80, 55, 90, 45, 50, 70, 45, 65, 50$. Find the mean, median, and mode of the runs scored. Which measure best describes the batsman's typical performance?

Answer:

Given Data:

The runs scored by a batsman in 11 innings are: $60, 45, 80, 55, 90, 45, 50, 70, 45, 65, 50$.

The number of innings is $n = 11$.


To Find:

Mean, Median, and Mode. Also, determine which measure best describes the typical performance.


Solution:

1. Mean:

Mean $ = \frac{\text{Sum of runs}}{\text{Number of innings}} $

Sum of runs $ = 60 + 45 + 80 + 55 + 90 + 45 + 50 + 70 + 45 + 65 + 50 $

$ = 655 $

Mean $ = \frac{655}{11} $

$ \approx 59.55 $

The mean runs scored is approximately $59.55$.


2. Median:

Arrange the runs in ascending order:

$45, 45, 45, 50, 50, 55, 60, 65, 70, 80, 90$

The number of innings is $n = 11$, which is odd.

Median position $ = \frac{n+1}{2} = \frac{11+1}{2} = 6$th position.

The run scored at the 6th position in the ordered list is $55$.

The median runs scored is $55$.


3. Mode:

The mode is the run scored that appears most frequently.

Let's count the frequency of each score:

  • 45: 3 times
  • 50: 2 times
  • 55: 1 time
  • 60: 1 time
  • 65: 1 time
  • 70: 1 time
  • 80: 1 time
  • 90: 1 time

The run scored $45$ appears most frequently ($3$ times).

The mode runs scored is $45$.


Best Measure of Typical Performance:

In this dataset, the mean ($59.55$), median ($55$), and mode ($45$) are all different.

  • The Mode (45) tells us the score that the batsman got most often, which was a relatively low score compared to his higher scores. This might not represent his overall performance well if he also scores much higher sometimes.
  • The Mean (59.55) is affected by the high score of 90, which pulls the average upwards.
  • The Median (55) is the middle value. It tells us that half of the innings scored 55 runs or less, and half scored 55 runs or more. It is not significantly affected by the highest score.

Considering the spread of scores and the presence of a relatively high score (90), the median (55) is arguably the measure that best describes the batsman's typical performance. It represents the central tendency without being overly influenced by the highest score, providing a more balanced view than the mode and less susceptible to skew than the mean in this case.

Question 8. Explain the difference between mean, median, and mode as measures of central tendency. Discuss the advantages and disadvantages of each measure and when each is most appropriate to use.

Answer:

Introduction:

Mean, median, and mode are the three most common measures of central tendency. They are single values used to represent the center or typical value of a dataset. While all aim to describe the "center", they do so in different ways and are sensitive to different aspects of the data distribution.


1. Mean (Arithmetic Mean):

Definition: The mean is the average of a dataset. It is calculated by summing all the observations and dividing by the total number of observations.

Calculation (Ungrouped Data): $\text{Mean} (\overline{x}) = \frac{\text{Sum of all observations}}{\text{Total number of observations}} = \frac{\sum x_i}{n}$

Advantages:

  • Uses all observations in the dataset.
  • Is a good representative of the data for symmetrical distributions.
  • Is suitable for further mathematical calculations and statistical analysis.

Disadvantages:

  • Highly affected by extreme values (outliers). A single outlier can significantly skew the mean.
  • Cannot be calculated for qualitative or categorical data.

When Appropriate to Use:

  • When the data is quantitative (numerical).
  • When the distribution of the data is symmetrical or approximately normal.
  • When you want a measure that reflects the contribution of every data point.

2. Median:

Definition: The median is the middle value of a dataset when the observations are arranged in either ascending or descending order. It divides the dataset into two equal halves.

Calculation (Ungrouped Data):

1. Arrange the data in order.

2. If the number of observations ($n$) is odd, the median is the value at the $\frac{n+1}{2}$-th position.

3. If the number of observations ($n$) is even, the median is the average of the values at the $\frac{n}{2}$-th and $(\frac{n}{2} + 1)$-th positions.

Advantages:

  • Not affected by extreme values (outliers).
  • Can be calculated for quantitative data and some types of qualitative data (ordinal data).
  • Provides a good sense of the "typical" value when the data is skewed.

Disadvantages:

  • Does not use all observations in its calculation (only relies on the middle value(s)).
  • Less suitable for further mathematical or statistical analysis compared to the mean.

When Appropriate to Use:

  • When the data is quantitative.
  • When the data distribution is skewed (e.g., salaries, property values).
  • When the dataset contains outliers.
  • For ordinal qualitative data.

3. Mode:

Definition: The mode is the observation that occurs most frequently in the dataset. A dataset can have one mode (unimodal), more than one mode (multimodal), or no mode.

Calculation (Ungrouped Data): Count the frequency of each distinct observation. The observation(s) with the highest frequency is the mode.

Advantages:

  • Can be used for both quantitative and qualitative (categorical) data.
  • Easy to understand and identify.
  • Not affected by extreme values.

Disadvantages:

  • May not exist for a dataset (when all frequencies are equal).
  • May not be unique (multimodal).
  • Provides information only about the most frequent value, ignoring other values.
  • Less useful for highly spread out data with no clear peak.

When Appropriate to Use:

  • When dealing with categorical data (e.g., favorite color, marital status).
  • When you want to identify the most common value or category in the dataset.
  • For discrete numerical data where identifying the most frequent value is meaningful.

Summary of Differences:

  • Mean: The average value; affected by every observation, including outliers; suitable for symmetrical quantitative data.
  • Median: The middle value; resistant to outliers; suitable for skewed quantitative data and ordinal data.
  • Mode: The most frequent value; can be used for all types of data (quantitative and qualitative); identifies peaks in frequency.

Question 9. The daily wages (in $\textsf{₹}$) of 30 workers in a factory are given below:

250280260250280300250260280300
250260280300250260280300250260
280300250260280300250260280300

Construct a grouped frequency distribution table using class intervals like $250-260, 260-270$, etc. (exclusive). Draw a histogram for this distribution.

Answer:

Given Data:

The daily wages (in $\textsf{₹}$) of 30 workers are: $250, 280, 260, 250, 280, 300, 250, 260, 280, 300, 250, 260, 280, 300, 250, 260, 280, 300, 250, 260, 280, 300, 250, 260, 280, 300, 250, 260, 280, 300$.

Total number of workers (observations) is $30$.


To Construct:

1. A grouped frequency distribution table using exclusive class intervals ($250-260, 260-270$, etc.).

2. A histogram for this distribution.


Construction of Grouped Frequency Distribution Table:

We will use exclusive class intervals, where the upper limit of an interval is not included in that interval but in the next one. The intervals are $250-260, 260-270, 270-280, 280-290, 290-300, 300-310$, etc.

Let's tally the frequencies for each interval:

  • $250-260$ ($\textsf{₹}250 \le \text{Wage} < \textsf{₹}260$): Count the number of wages that are 250. Wages: $250, 250, 250, 250, 250, 250, 250, 250$ Frequency: 8
  • $260-270$ ($\textsf{₹}260 \le \text{Wage} < \textsf{₹}270$): Count the number of wages that are 260. Wages: $260, 260, 260, 260, 260, 260, 260$ Frequency: 7
  • $270-280$ ($\textsf{₹}270 \le \text{Wage} < \textsf{₹}280$): No wages fall in this range. Frequency: 0
  • $280-290$ ($\textsf{₹}280 \le \text{Wage} < \textsf{₹}290$): Count the number of wages that are 280. Wages: $280, 280, 280, 280, 280, 280, 280, 280$ Frequency: 8
  • $290-300$ ($\textsf{₹}290 \le \text{Wage} < \textsf{₹}300$): No wages fall in this range. Frequency: 0
  • $300-310$ ($\textsf{₹}300 \le \text{Wage} < \textsf{₹}310$): Count the number of wages that are 300. Wages: $300, 300, 300, 300, 300, 300, 300$ Frequency: 7

The grouped frequency distribution table is as follows:

Daily Wages ($\textsf{₹}$) Number of Workers (Frequency)
$250 - 260$8
$260 - 270$7
$270 - 280$0
$280 - 290$8
$290 - 300$0
$300 - 310$7
Total30

Drawing the Histogram:

To draw the histogram, we will use the class intervals on the horizontal axis and the frequencies on the vertical axis.

1. Draw the horizontal axis (x-axis) and label it "Daily Wages ($\textsf{₹}$)". Mark the class boundaries: $250, 260, 270, 280, 290, 300, 310$. Since the axis doesn't start from 0, you might consider using a kink or zig-zag mark near the origin.

2. Draw the vertical axis (y-axis) and label it "Number of Workers (Frequency)". Choose an appropriate scale for the frequencies (e.g., from 0 to 9), as the maximum frequency is 8.

3. Draw adjacent rectangular bars above each class interval. The width of each bar is the class width ($10$), and the height of each bar is equal to the frequency of that interval.

  • For $250-260$, draw a bar of height $8$.
  • For $260-270$, draw a bar of height $7$.
  • For $270-280$, draw a bar of height $0$ (or no bar).
  • For $280-290$, draw a bar of height $8$.
  • For $290-300$, draw a bar of height $0$ (or no bar).
  • For $300-310$, draw a bar of height $7$.

Since this is an exclusive grouping method, there are no gaps between the bars representing consecutive intervals with non-zero frequencies.

Question 10. The following table gives the distribution of students of two sections according to the marks obtained by them:

Marks Frequency (Section A) Frequency (Section B)
0-10 3 5
10-20 9 19
20-30 17 15
30-40 12 10
40-50 9 1

Represent the marks of both sections on the same graph by two frequency polygons. Compare the performance of the two sections based on the graph.

Answer:

Given:

Grouped frequency distribution table for marks of students in two sections, A and B.


To Draw:

Two frequency polygons on the same graph, one for Section A and one for Section B. Compare the performance based on the polygons.


Solution - Construction of Frequency Polygons:

To draw a frequency polygon, we need the class marks (midpoints) of the class intervals. The formula for the class mark is $\frac{\text{Lower Limit} + \text{Upper Limit}}{2}$.

Let's calculate the class marks for the given intervals:

  • $0-10$: Class Mark $ = \frac{0+10}{2} = 5$
  • $10-20$: Class Mark $ = \frac{10+20}{2} = 15$
  • $20-30$: Class Mark $ = \frac{20+30}{2} = 25$
  • $30-40$: Class Mark $ = \frac{30+40}{2} = 35$
  • $40-50$: Class Mark $ = \frac{40+50}{2} = 45$

To close the polygon, we consider hypothetical intervals with zero frequency before the first interval and after the last interval. The class width is $10$.

  • Preceding interval's class mark: $5 - 10 = -5$ (Frequency 0)
  • Succeeding interval's class mark: $45 + 10 = 55$ (Frequency 0)

Now we list the points to be plotted (Class Mark, Frequency) for each section:

Section A: $(-5, 0), (5, 3), (15, 9), (25, 17), (35, 12), (45, 9), (55, 0)$

Section B: $(-5, 0), (5, 5), (15, 19), (25, 15), (35, 10), (45, 1), (55, 0)$


Drawing the Graph:

1. Draw the horizontal axis (x-axis) and label it "Marks (Class Marks)". Mark points for the class marks from $-5$ to $55$ with equal spacing.

2. Draw the vertical axis (y-axis) and label it "Frequency (Number of Students)". Choose a suitable scale for the frequencies (e.g., from 0 to 20), as the maximum frequency is $19$.

3. Plot the points for Section A on the graph and connect them with straight line segments. Use a specific color or line style (e.g., a solid line) for Section A.

4. Plot the points for Section B on the same graph and connect them with straight line segments. Use a different color or line style (e.g., a dashed line) for Section B.

5. Add a legend to the graph to indicate which polygon represents Section A and which represents Section B.


Comparison of Performance:

By observing the two frequency polygons on the same graph, we can compare the performance of the two sections:

  • The peak of the frequency polygon for Section A is at the class mark $25$ (interval $20-30$), indicating that the highest number of students in Section A scored marks between $20$ and $30$. The polygon extends significantly towards higher marks ($30-40$ and $40-50$).
  • The peak of the frequency polygon for Section B is at the class mark $15$ (interval $10-20$), indicating that the highest number of students in Section B scored marks between $10$ and $20$. The polygon for Section B is much higher than Section A in the $10-20$ interval but drops off more sharply for higher marks, having significantly fewer students in the $30-40$ and $40-50$ intervals compared to Section A.

From the graph, it is evident that Section A has a higher concentration of students in the higher mark ranges ($20-30, 30-40, 40-50$) compared to Section B. While Section B has more students in the lower range ($10-20$), Section A's distribution is shifted more towards the right (higher marks).


Conclusion: Based on the frequency polygons, Section A generally shows a better performance than Section B, as more students in Section A have obtained higher marks.

Question 11. Explain the process of calculating the mean from a frequency distribution table for ungrouped data. Find the mean of the following data:

Value (x) Frequency (f)
5 4
10 6
15 8
20 5
25 2

Answer:

Process of Calculating Mean from an Ungrouped Frequency Distribution Table:

When data is presented in an ungrouped frequency distribution table, it means we have a list of distinct observations (values) and the number of times each observation occurs (its frequency). To calculate the mean from such a table, we follow these steps:

1. For each distinct value ($x_i$), multiply the value by its corresponding frequency ($f_i$). This gives us the product $f_i x_i$, which represents the total contribution of that value to the sum of all observations.

2. Calculate the sum of all these products ($ \sum f_i x_i $). This sum represents the total sum of all the original observations in the dataset.

3. Calculate the total number of observations ($ \sum f_i $). This is the sum of all frequencies.

4. Divide the sum of the products ($ \sum f_i x_i $) by the total number of observations ($ \sum f_i $). This gives us the mean.


The formula for the mean from an ungrouped frequency distribution table is:

$ \text{Mean} (\overline{x}) = \frac{\sum (f_i \times x_i)}{\sum f_i} $


Finding the Mean for the Given Data:

The given ungrouped frequency distribution table is:

Value (x) Frequency (f)
5 4
10 6
15 8
20 5
25 2

Let's add a column for $f \times x$:

Value (x) Frequency (f) f $\times$ x
5 4 $4 \times 5 = 20$
10 6 $6 \times 10 = 60$
15 8 $8 \times 15 = 120$
20 5 $5 \times 20 = 100$
25 2 $2 \times 25 = 50$

Now, calculate the sums:

$ \sum f = 4 + 6 + 8 + 5 + 2 = 25 $

$ \sum (f \times x) = 20 + 60 + 120 + 100 + 50 = 350 $


Using the formula for the mean:

$ \text{Mean} = \frac{\sum (f \times x)}{\sum f} = \frac{350}{25} $

$ = 14 $


The mean of the given data is $14$.

Question 12. The following data gives the amount of time (in hours) 20 students spent watching television in a week:

1012815121014121110
15121310141211101312

Prepare a grouped frequency distribution table with class intervals $8-10, 10-12$, etc. (exclusive). Draw a histogram and a frequency polygon for this distribution on separate graphs.

Answer:

Given Data:

The amount of time (in hours) 20 students spent watching television: $10, 12, 8, 15, 12, 10, 14, 12, 11, 10, 15, 12, 13, 10, 14, 12, 11, 10, 13, 12$.

Total number of students (observations) is $20$.


To Prepare:

1. A grouped frequency distribution table using exclusive class intervals ($8-10, 10-12$, etc.).

2. A histogram for this distribution.

3. A frequency polygon for this distribution.


Construction of Grouped Frequency Distribution Table:

We will use exclusive class intervals as requested. This means a value equal to the upper limit of an interval is included in the next interval.

  • $8-10$: includes values $\ge 8$ and $< 10$.
  • $10-12$: includes values $\ge 10$ and $< 12$.
  • $12-14$: includes values $\ge 12$ and $< 14$.
  • $14-16$: includes values $\ge 14$ and $< 16$.

Let's count the frequency for each interval from the given data:

Data: $8, 10, 10, 10, 10, 10, 11, 11, 12, 12, 12, 12, 12, 12, 13, 13, 14, 14, 15, 15$ (Ordered for easier counting)

  • $8-10$: The only value $\ge 8$ and $< 10$ is $8$. Frequency = 1.
  • $10-12$: Values $\ge 10$ and $< 12$ are $10, 10, 10, 10, 10, 11, 11$. Frequency = 7.
  • $12-14$: Values $\ge 12$ and $< 14$ are $12, 12, 12, 12, 12, 12, 13, 13$. Frequency = 8.
  • $14-16$: Values $\ge 14$ and $< 16$ are $14, 14, 15, 15$. Frequency = 4.

Total frequency $ = 1 + 7 + 8 + 4 = 20$, which matches the number of students.

The grouped frequency distribution table is:

Time (Hours) Number of Students (Frequency)
$8 - 10$1
$10 - 12$7
$12 - 14$8
$14 - 16$4
Total20

Drawing the Histogram:

To draw a histogram, we plot the class intervals on the horizontal axis and the frequencies on the vertical axis.

1. Draw the horizontal axis (x-axis) and label it "Time (Hours)". Mark the class boundaries: $8, 10, 12, 14, 16$. Since the axis starts at 8, you might use a zig-zag mark near the origin if starting from 0.

2. Draw the vertical axis (y-axis) and label it "Frequency (Number of Students)". Choose an appropriate scale (e.g., from 0 to 9), as the maximum frequency is 8.

3. Draw adjacent rectangular bars above each class interval on the horizontal axis. The width of each bar corresponds to the class width (which is 2 for all intervals: $10-8=2$, $12-10=2$, etc.). The height of each bar is equal to the frequency of that interval.

  • For $8-10$, draw a bar of height $1$.
  • For $10-12$, draw a bar of height $7$.
  • For $12-14$, draw a bar of height $8$.
  • For $14-16$, draw a bar of height $4$.

The bars will be adjacent because the upper limit of one class is the lower limit of the next in this exclusive method.


Drawing the Frequency Polygon:

To draw a frequency polygon, we first find the class marks (midpoints) of each interval. We will plot points using these class marks and their frequencies and connect them with lines.

1. Calculate Class Marks:

  • For $8-10$: Class Mark $ = \frac{8+10}{2} = 9$
  • For $10-12$: Class Mark $ = \frac{10+12}{2} = 11$
  • For $12-14$: Class Mark $ = \frac{12+14}{2} = 13$
  • For $14-16$: Class Mark $ = \frac{14+16}{2} = 15$

2. Plot Points (Class Mark, Frequency): $(9, 1), (11, 7), (13, 8), (15, 4)$.

3. Determine Closing Points: The class width is $2$. We add hypothetical intervals with frequency 0 at the start and end.

  • Class mark before the first interval: $9 - 2 = 7$. Point: $(7, 0)$.
  • Class mark after the last interval: $15 + 2 = 17$. Point: $(17, 0)$.

4. Draw the Polygon:

On a separate graph from the histogram:

  • Draw the horizontal axis (x-axis) and label it "Time (Hours)". Mark the class marks $7, 9, 11, 13, 15, 17$ on this axis.
  • Draw the vertical axis (y-axis) and label it "Frequency (Number of Students)". Use the same scale as for the histogram (e.g., from 0 to 9).
  • Plot the points: $(7, 0), (9, 1), (11, 7), (13, 8), (15, 4), (17, 0)$.
  • Connect these plotted points with straight line segments to form the frequency polygon.